From: Mel Gorman <mgorman@suse.de>
To: Christoph Lameter <cl@linux.com>
Cc: Linux-MM <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>, Dave Hansen <dave@sr71.net>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/22] mm: page allocator: Allocate/free order-0 pages from a per-zone magazine
Date: Thu, 9 May 2013 16:23:18 +0100 [thread overview]
Message-ID: <20130509152318.GD11497@suse.de> (raw)
In-Reply-To: <0000013e85732d03-05e35c8e-205e-4242-98f5-2ae7bda64c5c-000000@email.amazonses.com>
On Wed, May 08, 2013 at 06:41:58PM +0000, Christoph Lameter wrote:
> On Wed, 8 May 2013, Mel Gorman wrote:
>
> > 1. IRQs do not have to be disabled to access the lists reducing IRQs
> > disabled times.
>
> The per cpu structure access also would not need to disable irq if the
> fast path would be using this_cpu ops.
>
How does this_cpu protect against preemption due to interrupt? this_read()
itself only disables preemption and it's explicitly documented that
interrupt that modifies the per-cpu data will not be reliable so the use
of the per-cpu lists is right out. It would require that a race-prone
check be used with cmpxchg which in turn would require arrays, not lists.
> > 2. As the list is protected by a spinlock, it is not necessary to
> > send IPI to drain the list. As the lists are accessible by multiple CPUs,
> > it is easier to tune.
>
> The lists are a problem since traversing list heads creates a lot of
> pressure on the processor and TLB caches. Could we either move to an array
> of pointers to page structs (like in SLAB)
They would have large memory requirements. The magazine data structure in
this series fits in a cache line. An array of 128 struct page pointers
would require 1K on 64-bit and if that thing is per possible CPU and per
zone then it could get really excessive.
> or to a linked list that is
> constrained within physical boundaries like within a PMD? (comparable
> to the SLUB approach)?
>
I don't see how as the page allocator does not control the physical location
of any pages freed to it and it's the struct pages it is linking together. On
some systems at least with 1G pages, the struct pages will be backed by
memory mapped with 1G entries so the TLB pressure should be reduced but
the cache pressure from struct page modifications is certainly a problem.
> > > 3. The magazine_lock is potentially hot but it can be split to have
> > one lock per CPU socket to reduce contention. Draining the lists
> > in this case would acquire multiple locks be acquired.
>
> IMHO the use of per cpu RMV operations would be lower latency than the use
> of spinlocks. There is no "lock" prefix overhead with those. Page
> allocation is a frequent operation that I would think needs to be as fast
> as possible.
The memory requirements may be large because those per-cpu areas sized are
allocated depending on num_possible_cpus()s. Correct? Regardless of their
size, it would still be required to deal with cpu hot-plug to avoid memory
leaks and draining them would still require global IPIs so the overall
code complexity would be similar to what exists today. Ultimately all that
changes is that we use an array+cmpxchg instead of a list which will shave
a small amount of latency but it will still be regularly falling back to
the buddy lists and contend on the zone->lock due the limited size of the
per-cpu magazines and hiding the advantage of using cmpxchg in the noise.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-09 15:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-08 16:02 [RFC PATCH 00/22] Per-cpu page allocator replacement prototype Mel Gorman
2013-05-08 16:02 ` [PATCH 01/22] mm: page allocator: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2013-05-08 16:02 ` [PATCH 02/22] mm: page allocator: Push down where IRQs are disabled during page free Mel Gorman
2013-05-08 16:02 ` [PATCH 03/22] mm: page allocator: Use unsigned int for order in more places Mel Gorman
2013-05-08 16:02 ` [PATCH 04/22] mm: page allocator: Only check migratetype of pages being drained while CMA active Mel Gorman
2013-05-08 16:02 ` [PATCH 05/22] oom: Use number of online nodes when deciding whether to suppress messages Mel Gorman
2013-05-08 16:02 ` [PATCH 06/22] mm: page allocator: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2013-05-08 16:02 ` [PATCH 07/22] mm: page allocator: Do not lookup the pageblock migratetype during allocation Mel Gorman
2013-05-08 16:02 ` [PATCH 08/22] mm: page allocator: Remove the per-cpu page allocator Mel Gorman
2013-05-08 16:02 ` [PATCH 09/22] mm: page allocator: Allocate/free order-0 pages from a per-zone magazine Mel Gorman
2013-05-08 18:41 ` Christoph Lameter
2013-05-09 15:23 ` Mel Gorman [this message]
2013-05-09 16:21 ` Christoph Lameter
2013-05-09 17:27 ` Mel Gorman
2013-05-09 18:08 ` Christoph Lameter
2013-05-08 16:02 ` [PATCH 10/22] mm: page allocator: Allocate and free pages from magazine in batches Mel Gorman
2013-05-08 16:02 ` [PATCH 11/22] mm: page allocator: Shrink the magazine to the migratetypes in use Mel Gorman
2013-05-08 16:02 ` [PATCH 12/22] mm: page allocator: Remove knowledge of hot/cold from page allocator Mel Gorman
2013-05-08 16:02 ` [PATCH 13/22] mm: page allocator: Use list_splice to refill the magazine Mel Gorman
2013-05-08 16:02 ` [PATCH 14/22] mm: page allocator: Do not disable IRQs just to update stats Mel Gorman
2013-05-08 16:03 ` [PATCH 15/22] mm: page allocator: Check if interrupts are enabled only once per allocation attempt Mel Gorman
2013-05-08 16:03 ` [PATCH 16/22] mm: page allocator: Remove coalescing improvement heuristic during page free Mel Gorman
2013-05-08 16:03 ` [PATCH 17/22] mm: page allocator: Move magazine access behind accessors Mel Gorman
2013-05-08 16:03 ` [PATCH 18/22] mm: page allocator: Split magazine lock in two to reduce contention Mel Gorman
2013-05-09 15:21 ` Dave Hansen
2013-05-15 19:44 ` Andi Kleen
2013-05-08 16:03 ` [PATCH 19/22] mm: page allocator: Watch for magazine and zone lock contention Mel Gorman
2013-05-08 16:03 ` [PATCH 20/22] mm: page allocator: Hold magazine lock for a batch of pages Mel Gorman
2013-05-08 16:03 ` [PATCH 21/22] mm: compaction: Release free page list under a batched magazine lock Mel Gorman
2013-05-08 16:03 ` [PATCH 22/22] mm: page allocator: Drain magazines for direct compact failures Mel Gorman
2013-05-09 15:41 ` [RFC PATCH 00/22] Per-cpu page allocator replacement prototype Dave Hansen
2013-05-09 16:25 ` Christoph Lameter
2013-05-09 17:33 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130509152318.GD11497@suse.de \
--to=mgorman@suse.de \
--cc=cl@linux.com \
--cc=dave@sr71.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).