All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	Jim Schutt <jaschut@sandia.gov>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures
Date: Fri, 10 Aug 2012 17:48:29 +0900	[thread overview]
Message-ID: <20120810084829.GF21033@bbox> (raw)
In-Reply-To: <20120810083438.GM12690@suse.de>

Hi Mel,

On Fri, Aug 10, 2012 at 09:34:38AM +0100, Mel Gorman wrote:
> On Fri, Aug 10, 2012 at 08:27:33AM +0900, Minchan Kim wrote:
> > > <SNIP>
> > >
> > > The intention is that an allocation can fail but each subsequent attempt will
> > > try harder until there is success. Each allocation request does a portion
> > > of the necessary work to spread the cost between multiple requests. Take
> > > THP for example where there is a constant request for THP allocations
> > > for whatever reason (heavy fork workload, large buffer allocation being
> > > populated etc.). Some of those allocations fail but if they do, future
> > > THP requests will reclaim more pages. When compaction resumes again, it
> > > will be more likely to succeed and compact_defer_shift gets reset. In the
> > > specific case of THP there will be allocations that fail but khugepaged
> > > will promote them later if the process is long-lived.
> > 
> > You assume high-order allocation are *constant* and I guess your test enviroment
> > is optimal for it.
> 
> Ok, my example stated they were constant because it was the easiest to
> illustrate but it does not necessarily have to be the case. The high-order
> allocation requests can be separated by any length of time with a read or
> write stream running in the background applying a small amount of memory
> pressure and the same scenario applies.
> 
> > I agree your patch if we can make sure such high-order
> > allocation are always constant. But, is it true? Otherwise, your patch could reclaim
> > too many pages unnecessary and it could reduce system performance by eviction
> 
> The "too many pages unnecessarily" is unlikely. For compact_defer_shift to be
> elevated there has to have been recent failures by try_to_compact_pages(). If
> compact_defer_shift is elevated and a large process exited then
> try_to_compact_pages() may succeed and reset compact_defer_shift without
> calling direct reclaim and entering this path at all.
> 
> > of page cache and swap out of workingset part. That's a concern to me.
> > In summary, I think your patch is rather agressive so how about this?
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 66e4310..0cb2593 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1708,6 +1708,7 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
> >  {
> >         unsigned long pages_for_compaction;
> >         unsigned long inactive_lru_pages;
> > +       struct zone *zone;
> > 
> >         /* If not in reclaim/compaction mode, stop */
> >         if (!in_reclaim_compaction(sc))
> > @@ -1741,6 +1742,15 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
> >          * inactive lists are large enough, continue reclaiming
> >          */
> >         pages_for_compaction = (2UL << sc->order);
> > +
> > +       /*
> > +        * If compaction is deferred for this order then scale the number of
> > +        * pages reclaimed based on the number of consecutive allocation
> > +        * failures
> > +        */
> > +       zone = lruvec_zone(lruvec);
> > +       if (zone->compact_order_failed <= sc->order) {
> > +               if (zone->compact_defer_shift)
> > +                       /*
> > +                        * We can't make sure deferred requests will come again
> > +                        * The probability is 50:50.
> > +                        */
> > +                       pages_for_compaction <<= (zone->compact_defer_shift - 1);
> 
> This patch is not doing anything radically different to my own patch.
> compact_defer_shift == 0 if allocations succeeded recently using
> reclaim/compaction at its normal level. Functionally the only difference
> is that you delay when more pages get reclaim by one failure.
> 
> Was that what you intended? If so, it's not clear why you think this patch
> is better or how you concluded that the probability of another failure was
> "50:50".

Please ignore my comment about this patch.
I got confused between compat_considered and compact_defer_shift.
compact_defer_shift is indication of constant high order page
allocationfailing so I have no objection any more.
Sorry for the noise. :(

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	Jim Schutt <jaschut@sandia.gov>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures
Date: Fri, 10 Aug 2012 17:48:29 +0900	[thread overview]
Message-ID: <20120810084829.GF21033@bbox> (raw)
In-Reply-To: <20120810083438.GM12690@suse.de>

Hi Mel,

On Fri, Aug 10, 2012 at 09:34:38AM +0100, Mel Gorman wrote:
> On Fri, Aug 10, 2012 at 08:27:33AM +0900, Minchan Kim wrote:
> > > <SNIP>
> > >
> > > The intention is that an allocation can fail but each subsequent attempt will
> > > try harder until there is success. Each allocation request does a portion
> > > of the necessary work to spread the cost between multiple requests. Take
> > > THP for example where there is a constant request for THP allocations
> > > for whatever reason (heavy fork workload, large buffer allocation being
> > > populated etc.). Some of those allocations fail but if they do, future
> > > THP requests will reclaim more pages. When compaction resumes again, it
> > > will be more likely to succeed and compact_defer_shift gets reset. In the
> > > specific case of THP there will be allocations that fail but khugepaged
> > > will promote them later if the process is long-lived.
> > 
> > You assume high-order allocation are *constant* and I guess your test enviroment
> > is optimal for it.
> 
> Ok, my example stated they were constant because it was the easiest to
> illustrate but it does not necessarily have to be the case. The high-order
> allocation requests can be separated by any length of time with a read or
> write stream running in the background applying a small amount of memory
> pressure and the same scenario applies.
> 
> > I agree your patch if we can make sure such high-order
> > allocation are always constant. But, is it true? Otherwise, your patch could reclaim
> > too many pages unnecessary and it could reduce system performance by eviction
> 
> The "too many pages unnecessarily" is unlikely. For compact_defer_shift to be
> elevated there has to have been recent failures by try_to_compact_pages(). If
> compact_defer_shift is elevated and a large process exited then
> try_to_compact_pages() may succeed and reset compact_defer_shift without
> calling direct reclaim and entering this path at all.
> 
> > of page cache and swap out of workingset part. That's a concern to me.
> > In summary, I think your patch is rather agressive so how about this?
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 66e4310..0cb2593 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1708,6 +1708,7 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
> >  {
> >         unsigned long pages_for_compaction;
> >         unsigned long inactive_lru_pages;
> > +       struct zone *zone;
> > 
> >         /* If not in reclaim/compaction mode, stop */
> >         if (!in_reclaim_compaction(sc))
> > @@ -1741,6 +1742,15 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
> >          * inactive lists are large enough, continue reclaiming
> >          */
> >         pages_for_compaction = (2UL << sc->order);
> > +
> > +       /*
> > +        * If compaction is deferred for this order then scale the number of
> > +        * pages reclaimed based on the number of consecutive allocation
> > +        * failures
> > +        */
> > +       zone = lruvec_zone(lruvec);
> > +       if (zone->compact_order_failed <= sc->order) {
> > +               if (zone->compact_defer_shift)
> > +                       /*
> > +                        * We can't make sure deferred requests will come again
> > +                        * The probability is 50:50.
> > +                        */
> > +                       pages_for_compaction <<= (zone->compact_defer_shift - 1);
> 
> This patch is not doing anything radically different to my own patch.
> compact_defer_shift == 0 if allocations succeeded recently using
> reclaim/compaction at its normal level. Functionally the only difference
> is that you delay when more pages get reclaim by one failure.
> 
> Was that what you intended? If so, it's not clear why you think this patch
> is better or how you concluded that the probability of another failure was
> "50:50".

Please ignore my comment about this patch.
I got confused between compat_considered and compact_defer_shift.
compact_defer_shift is indication of constant high order page
allocationfailing so I have no objection any more.
Sorry for the noise. :(

-- 
Kind regards,
Minchan Kim

  reply	other threads:[~2012-08-10  8:46 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07 12:31 [RFC PATCH 0/6] Improve hugepage allocation success rates under load Mel Gorman
2012-08-07 12:31 ` Mel Gorman
2012-08-07 12:31 ` [PATCH 1/6] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:19   ` Rik van Riel
2012-08-07 13:19     ` Rik van Riel
2012-08-07 23:25   ` Minchan Kim
2012-08-07 23:25     ` Minchan Kim
2012-08-07 12:31 ` [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:23   ` Rik van Riel
2012-08-07 13:23     ` Rik van Riel
2012-08-08  1:48   ` Minchan Kim
2012-08-08  1:48     ` Minchan Kim
2012-08-08  7:55     ` Mel Gorman
2012-08-08  7:55       ` Mel Gorman
2012-08-08  8:27       ` Minchan Kim
2012-08-08  8:27         ` Minchan Kim
2012-08-08  8:51         ` Mel Gorman
2012-08-08  8:51           ` Mel Gorman
2012-08-08 23:51           ` Minchan Kim
2012-08-08 23:51             ` Minchan Kim
2012-08-09  7:49             ` Mel Gorman
2012-08-09  7:49               ` Mel Gorman
2012-08-09  8:27               ` Minchan Kim
2012-08-09  8:27                 ` Minchan Kim
2012-08-09  9:20                 ` Mel Gorman
2012-08-09  9:20                   ` Mel Gorman
2012-08-09 20:29                   ` Rik van Riel
2012-08-09 20:29                     ` Rik van Riel
2012-08-10  8:14                     ` Mel Gorman
2012-08-10  8:14                       ` Mel Gorman
2012-08-09 23:27                   ` Minchan Kim
2012-08-09 23:27                     ` Minchan Kim
2012-08-10  8:34                     ` Mel Gorman
2012-08-10  8:34                       ` Mel Gorman
2012-08-10  8:48                       ` Minchan Kim [this message]
2012-08-10  8:48                         ` Minchan Kim
2012-08-07 12:31 ` [PATCH 3/6] mm: kswapd: Continue reclaiming for reclaim/compaction if the minimum number of pages have not been reclaimed Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:26   ` Rik van Riel
2012-08-07 13:26     ` Rik van Riel
2012-08-08  2:07   ` Minchan Kim
2012-08-08  2:07     ` Minchan Kim
2012-08-08  9:07     ` Mel Gorman
2012-08-08  9:07       ` Mel Gorman
2012-08-08  9:58       ` Mel Gorman
2012-08-08  9:58         ` Mel Gorman
2012-08-07 12:31 ` [PATCH 4/6] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:30   ` Rik van Riel
2012-08-07 13:30     ` Rik van Riel
2012-08-07 12:31 ` [PATCH 5/6] mm: have order > 0 compaction start off where it left Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 12:31 ` [PATCH 6/6] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 14:45   ` Rik van Riel
2012-08-07 14:45     ` Rik van Riel
2012-08-07 14:52     ` Mel Gorman
2012-08-07 14:52       ` Mel Gorman
2012-08-07 15:20       ` Jim Schutt
2012-08-07 15:20         ` Jim Schutt
2012-08-07 15:45         ` Mel Gorman
2012-08-07 15:45           ` Mel Gorman
2012-08-08  4:36   ` Minchan Kim
2012-08-08  4:36     ` Minchan Kim
2012-08-08 10:18     ` Mel Gorman
2012-08-08 10:18       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120810084829.GF21033@bbox \
    --to=minchan@kernel.org \
    --cc=jaschut@sandia.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.