Re: [patch 3/3] mm: page_alloc: fair zone allocator policy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 3/3] mm: page_alloc: fair zone allocator policy
Date: Thu, 1 Aug 2013 17:51:11 +0200	[thread overview]
Message-ID: <20130801155111.GO25926@redhat.com> (raw)
In-Reply-To: <51F9E4A6.2090909@redhat.com>

On Thu, Aug 01, 2013 at 12:31:34AM -0400, Rik van Riel wrote:
> On 07/31/2013 10:56 PM, Minchan Kim wrote:
> 
> > Yes, it's not really slow path because it could return to normal status
> > without calling significant slow functions by reset batchcount of
> > prepare_slowpath.
> >
> > I think it's tradeoff and I am biased your approach although we would
> > lose a little performance because fair aging would recover the loss by
> > fastpath's overhead. But who knows? Someone has a concern.
> >
> > So we should mention about such problems.
> 
> If the atomic operation in the fast path turns out to be a problem,
> I suspect we may be able to fix it by using per-cpu counters, and
> consolidating those every once in a while.
> 
> However, it may be good to see whether there is a problem in the
> first place, before adding complexity.

prepare_slowpath is racy anyway, so I don't see the point in wanting
to do atomic ops in the first place.

Much better to do the increment in assembly in a single insn to reduce
the window, but there's no point in using the lock prefix when
atomic_set will screw it over while somebody does atomic_dec under it
anyway. atomic_set cannot be atomic, so it'll screw over any
perfectly accurate accounted atomic_dec anyway.

I think it should be done in C without atomic_dec or by introducing an
__atomic_dec that isn't using the lock prefix and just tries to do the
dec in a single (or as few as possible) asm insn.

And because the whole thing is racy, after prepare_slowpath (which I
think is a too generic name and should be renamed
reset_zonelist_alloc_batch), this last attempt before invoking
reclaim:

	/* This is the last chance, in general, before the goto nopage. */
	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
			preferred_zone, migratetype);

which is trying again with the MIN wmark, should be also altered so
that we do one pass totally disregarding alloc_batch, just before
declaring MIN-wmark failure in the allocation, and invoking direct
reclaim.

Either we add a (alloc_flags | ALLOC_NO_ALLOC_BATCH) to the above
alloc_flags in the above call, or we run a second
get_page_from_freelist if the above one fails with
ALLOC_NO_ALLOC_BATCH set.

Otherwise because of the inherent racy behavior of the alloc_batch
logic, what will happen is that we'll eventually have enough hugepage
allocations in enough CPUs to hit the alloc_batch limit again in
between the prepare_slowpath and the above get_page_from_freelist,
that will lead us to call spurious direct_reclaim invocations, when it
was just a false negative of alloc_batch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Andrea Arcangeli <aarcange@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 3/3] mm: page_alloc: fair zone allocator policy
Date: Thu, 1 Aug 2013 17:51:11 +0200	[thread overview]
Message-ID: <20130801155111.GO25926@redhat.com> (raw)
In-Reply-To: <51F9E4A6.2090909@redhat.com>

On Thu, Aug 01, 2013 at 12:31:34AM -0400, Rik van Riel wrote:
> On 07/31/2013 10:56 PM, Minchan Kim wrote:
> 
> > Yes, it's not really slow path because it could return to normal status
> > without calling significant slow functions by reset batchcount of
> > prepare_slowpath.
> >
> > I think it's tradeoff and I am biased your approach although we would
> > lose a little performance because fair aging would recover the loss by
> > fastpath's overhead. But who knows? Someone has a concern.
> >
> > So we should mention about such problems.
> 
> If the atomic operation in the fast path turns out to be a problem,
> I suspect we may be able to fix it by using per-cpu counters, and
> consolidating those every once in a while.
> 
> However, it may be good to see whether there is a problem in the
> first place, before adding complexity.

prepare_slowpath is racy anyway, so I don't see the point in wanting
to do atomic ops in the first place.

Much better to do the increment in assembly in a single insn to reduce
the window, but there's no point in using the lock prefix when
atomic_set will screw it over while somebody does atomic_dec under it
anyway. atomic_set cannot be atomic, so it'll screw over any
perfectly accurate accounted atomic_dec anyway.

I think it should be done in C without atomic_dec or by introducing an
__atomic_dec that isn't using the lock prefix and just tries to do the
dec in a single (or as few as possible) asm insn.

And because the whole thing is racy, after prepare_slowpath (which I
think is a too generic name and should be renamed
reset_zonelist_alloc_batch), this last attempt before invoking
reclaim:

	/* This is the last chance, in general, before the goto nopage. */
	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
			preferred_zone, migratetype);

which is trying again with the MIN wmark, should be also altered so
that we do one pass totally disregarding alloc_batch, just before
declaring MIN-wmark failure in the allocation, and invoking direct
reclaim.

Either we add a (alloc_flags | ALLOC_NO_ALLOC_BATCH) to the above
alloc_flags in the above call, or we run a second
get_page_from_freelist if the above one fails with
ALLOC_NO_ALLOC_BATCH set.

Otherwise because of the inherent racy behavior of the alloc_batch
logic, what will happen is that we'll eventually have enough hugepage
allocations in enough CPUs to hit the alloc_batch limit again in
between the prepare_slowpath and the above get_page_from_freelist,
that will lead us to call spurious direct_reclaim invocations, when it
was just a false negative of alloc_batch.

next prev parent reply	other threads:[~2013-08-01 15:51 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19 20:55 [patch 0/3] mm: improve page aging fairness between zones/nodes Johannes Weiner
2013-07-19 20:55 ` Johannes Weiner
2013-07-19 20:55 ` [patch 1/3] mm: vmscan: fix numa reclaim balance problem in kswapd Johannes Weiner
2013-07-19 20:55   ` Johannes Weiner
2013-07-22 19:47   ` Rik van Riel
2013-07-22 19:47     ` Rik van Riel
2013-07-22 20:14     ` Johannes Weiner
2013-07-22 20:14       ` Johannes Weiner
2013-07-26 22:53   ` Andrew Morton
2013-07-26 22:53     ` Andrew Morton
2013-07-30 17:45     ` Johannes Weiner
2013-07-30 17:45       ` Johannes Weiner
2013-07-31 12:43       ` Johannes Weiner
2013-07-31 12:43         ` Johannes Weiner
2013-07-19 20:55 ` [patch 2/3] mm: page_alloc: rearrange watermark checking in get_page_from_freelist Johannes Weiner
2013-07-19 20:55   ` Johannes Weiner
2013-07-22 19:51   ` Rik van Riel
2013-07-22 19:51     ` Rik van Riel
2013-07-19 20:55 ` [patch 3/3] mm: page_alloc: fair zone allocator policy Johannes Weiner
2013-07-19 20:55   ` Johannes Weiner
2013-07-22 20:21   ` Rik van Riel
2013-07-22 20:21     ` Rik van Riel
2013-07-22 21:04     ` Johannes Weiner
2013-07-22 21:04       ` Johannes Weiner
2013-07-22 22:48       ` Rik van Riel
2013-07-22 22:48         ` Rik van Riel
2013-07-25  6:50     ` Paul Bolle
2013-07-25  6:50       ` Paul Bolle
2013-07-25 15:10       ` Johannes Weiner
2013-07-25 15:10         ` Johannes Weiner
2013-07-25 15:20         ` Paul Bolle
2013-07-25 15:20           ` Paul Bolle
2013-07-29 17:48   ` Andrea Arcangeli
2013-07-29 17:48     ` Andrea Arcangeli
2013-07-29 22:24     ` Johannes Weiner
2013-07-29 22:24       ` Johannes Weiner
2013-08-01  2:56   ` Minchan Kim
2013-08-01  2:56     ` Minchan Kim
2013-08-01  4:31     ` Rik van Riel
2013-08-01  4:31       ` Rik van Riel
2013-08-01 15:51       ` Andrea Arcangeli [this message]
2013-08-01 15:51         ` Andrea Arcangeli
2013-08-01 19:58         ` Johannes Weiner
2013-08-01 19:58           ` Johannes Weiner
2013-08-01 22:16           ` Andrea Arcangeli
2013-08-01 22:16             ` Andrea Arcangeli
2013-08-02  6:22     ` Johannes Weiner
2013-08-02  6:22       ` Johannes Weiner
2013-08-02  7:32       ` Minchan Kim
2013-08-02  7:32         ` Minchan Kim
2013-07-22 16:48 ` [patch 0/3] mm: improve page aging fairness between zones/nodes Zlatko Calusic
2013-07-22 16:48   ` Zlatko Calusic
2013-07-22 17:01   ` Johannes Weiner
2013-07-22 17:01     ` Johannes Weiner
2013-07-22 17:14     ` Zlatko Calusic
2013-07-24 11:18   ` Zlatko Calusic
2013-07-24 12:46     ` Hush Bensen
2013-07-24 12:46       ` Hush Bensen
2013-07-24 13:59       ` Zlatko Calusic
2013-07-24 13:59         ` Zlatko Calusic
2013-07-31  9:33     ` Zlatko Calusic
2013-07-31  9:33       ` Zlatko Calusic
2013-07-26 22:45 ` Andrew Morton
2013-07-26 22:45   ` Andrew Morton
2013-07-26 23:14   ` Johannes Weiner
2013-07-26 23:14     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130801155111.GO25926@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.