From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
Jerome Glisse <j.glisse@gmail.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
jglisse@redhat.com, mgorman@suse.de, aarcange@redhat.com,
airlied@redhat.com, aneesh.kumar@linux.vnet.ibm.com,
Cameron Buschardt <cabuschardt@nvidia.com>,
Mark Hairgrove <mhairgrove@nvidia.com>,
Geoffrey Gerfin <ggerfin@nvidia.com>,
John McKenna <jmckenna@nvidia.com>,
akpm@linux-foundation.org
Subject: Re: Interacting with coherent memory on external devices
Date: Thu, 14 May 2015 09:38:46 +1000 [thread overview]
Message-ID: <1431560326.20218.94.camel@kernel.crashing.org> (raw)
In-Reply-To: <55535B6E.5090700@suse.cz>
On Wed, 2015-05-13 at 16:10 +0200, Vlastimil Babka wrote:
> Sorry for reviving oldish thread...
Well, that's actually appreciated since this is constructive discussion
of the kind I was hoping to trigger initially :-) I'll look at
ZONE_MOVABLE, I wasn't aware of its existence.
Don't we still have the problem that ZONEs must be somewhat contiguous
chunks ? Ie, my "CAPI memory" will be interleaved in the physical
address space somewhat.. This is due to the address space on some of
those systems where you'll basically have something along the lines of:
[ node 0 mem ] [ node 0 CAPI dev ] .... [ node 1 mem] [ node 1 CAPI dev] ...
> On 04/28/2015 01:54 AM, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-04-27 at 11:48 -0500, Christoph Lameter wrote:
> >> On Mon, 27 Apr 2015, Rik van Riel wrote:
> >>
> >>> Why would we want to avoid the sane approach that makes this thing
> >>> work with the fewest required changes to core code?
> >>
> >> Becaus new ZONEs are a pretty invasive change to the memory management and
> >> because there are other ways to handle references to device specific
> >> memory.
> >
> > ZONEs is just one option we put on the table.
> >
> > I think we can mostly agree on the fundamentals that a good model of
> > such a co-processor is a NUMA node, possibly with a higher distance
> > than other nodes (but even that can be debated).
> >
> > That gives us a lot of the basics we need such as struct page, ability
> > to use existing migration infrastructure, and is actually a reasonably
> > representation at high level as well.
> >
> > The question is how do we additionally get the random stuff we don't
> > care about out of the way. The large distance will not help that much
> > under memory pressure for example.
> >
> > Covering the entire device memory with a CMA goes a long way toward that
> > goal. It will avoid your ordinary kernel allocations.
>
> I think ZONE_MOVABLE should be sufficient for this. CMA is basically for
> marking parts of zones as MOVABLE-only. You shouldn't need that for the
> whole zone. Although it might happen that CMA will be a special zone one
> day.
>
> > It also provides just what we need to be able to do large contiguous
> > "explicit" allocations for use by workloads that don't want the
> > transparent migration and by the driver for the device which might also
> > need such special allocations for its own internal management data
> > structures.
>
> Plain zone compaction + reclaim should work as well in a ZONE_MOVABLE
> zone. CMA allocations might IIRC additionally migrate across zones, e.g.
> from the device to system memory (unlike plain compaction), which might
> be what you want, or not.
>
> > We still have the risk of pages in the CMA being pinned by something
> > like gup however, that's where the ZONE idea comes in, to ensure the
> > various kernel allocators will *never* allocate in that zone unless
> > explicitly specified, but that could possibly implemented differently.
>
> Kernel allocations should ignore the ZONE_MOVABLE zone as they are not
> typically movable. Then it depends on how much control you want for
> userspace allocations.
>
> > Maybe a concept of "exclusive" NUMA node, where allocations never
> > fallback to that node unless explicitly asked to go there.
>
> I guess that could be doable on the zonelist level, where the device
> memory node/zone wouldn't be part of the "normal" zonelists, so memory
> pressure calculations should be also fine. But sure there will be some
> corner cases :)
>
> > Of course that would have an impact on memory pressure calculations,
> > nothign comes completely for free, but at this stage, this is the goal
> > of this thread, ie, to swap ideas around and see what's most likely to
> > work in the long run before we even start implementing something.
> >
> > Cheers,
> > Ben.
> >
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-05-13 23:38 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-21 21:44 Interacting with coherent memory on external devices Paul E. McKenney
2015-04-21 23:46 ` Jerome Glisse
2015-04-22 0:36 ` Benjamin Herrenschmidt
2015-04-22 12:42 ` Paul E. McKenney
2015-04-21 23:49 ` Christoph Lameter
2015-04-22 0:05 ` Jerome Glisse
2015-04-22 0:50 ` Christoph Lameter
2015-04-22 1:01 ` Benjamin Herrenschmidt
2015-04-22 13:35 ` Paul E. McKenney
2015-04-22 13:18 ` Paul E. McKenney
2015-04-22 16:16 ` Christoph Lameter
2015-04-22 17:07 ` Jerome Glisse
2015-04-22 18:17 ` Christoph Lameter
2015-04-22 18:52 ` Paul E. McKenney
2015-04-23 14:12 ` Christoph Lameter
2015-04-23 19:24 ` Paul E. McKenney
2015-04-24 14:01 ` Christoph Lameter
2015-04-24 14:13 ` Paul E. McKenney
2015-04-24 15:53 ` Rik van Riel
2015-04-23 2:36 ` Benjamin Herrenschmidt
2015-04-23 14:10 ` Christoph Lameter
2015-04-23 15:42 ` Jerome Glisse
2015-04-24 14:04 ` Christoph Lameter
2015-04-23 22:29 ` Benjamin Herrenschmidt
2015-04-23 2:30 ` Benjamin Herrenschmidt
2015-04-23 14:25 ` Christoph Lameter
2015-04-23 15:25 ` Austin S Hemmelgarn
2015-04-23 19:33 ` Paul E. McKenney
2015-04-24 14:12 ` Christoph Lameter
2015-04-24 14:57 ` Paul E. McKenney
2015-04-24 15:09 ` Jerome Glisse
2015-04-25 11:20 ` Paul E. McKenney
2015-04-24 15:52 ` Christoph Lameter
2015-04-23 22:37 ` Benjamin Herrenschmidt
2015-04-24 14:09 ` Christoph Lameter
2015-04-23 16:04 ` Rik van Riel
2015-04-22 0:42 ` Benjamin Herrenschmidt
2015-04-22 0:57 ` Paul E. McKenney
2015-04-22 1:04 ` Benjamin Herrenschmidt
2015-04-22 15:25 ` Christoph Lameter
2015-04-22 16:31 ` Jerome Glisse
2015-04-22 17:14 ` Christoph Lameter
2015-04-22 19:07 ` Jerome Glisse
2015-04-23 2:34 ` Benjamin Herrenschmidt
2015-04-23 14:38 ` Christoph Lameter
2015-04-23 16:11 ` Jerome Glisse
2015-04-24 14:29 ` Christoph Lameter
2015-04-24 15:08 ` Jerome Glisse
2015-04-24 16:03 ` Christoph Lameter
2015-04-24 16:43 ` Jerome Glisse
2015-04-24 16:58 ` Christoph Lameter
2015-04-24 17:19 ` Jerome Glisse
2015-04-24 18:56 ` Christoph Lameter
2015-04-24 19:29 ` Jerome Glisse
2015-04-24 20:00 ` Christoph Lameter
2015-04-24 20:32 ` Jerome Glisse
2015-04-25 11:46 ` Paul E. McKenney
2015-04-27 15:08 ` Christoph Lameter
2015-04-27 15:47 ` Jerome Glisse
2015-04-27 16:17 ` Christoph Lameter
2015-04-27 16:29 ` Rik van Riel
2015-04-27 16:48 ` Christoph Lameter
2015-04-27 23:54 ` Benjamin Herrenschmidt
2015-05-13 14:10 ` Vlastimil Babka
2015-05-13 23:38 ` Benjamin Herrenschmidt [this message]
2015-05-14 7:39 ` Vlastimil Babka
2015-05-14 7:51 ` Benjamin Herrenschmidt
2015-05-28 18:18 ` Paul E. McKenney
2015-04-27 16:43 ` Jerome Glisse
2015-04-27 16:51 ` Christoph Lameter
2015-04-27 17:21 ` Jerome Glisse
2015-04-27 19:26 ` Christoph Lameter
2015-04-27 19:35 ` Rik van Riel
2015-04-27 20:52 ` Jerome Glisse
2015-04-28 14:18 ` Christoph Lameter
2015-04-28 17:20 ` Jerome Glisse
2015-04-27 16:15 ` Paul E. McKenney
2015-04-27 16:31 ` Christoph Lameter
2015-04-24 23:45 ` Benjamin Herrenschmidt
2015-04-23 18:52 ` Paul E. McKenney
2015-04-24 14:30 ` Christoph Lameter
2015-04-24 14:54 ` Paul E. McKenney
2015-04-24 15:49 ` Christoph Lameter
2015-04-24 16:06 ` Rik van Riel
2015-04-25 11:49 ` Paul E. McKenney
2015-04-24 16:00 ` Jerome Glisse
2015-04-24 16:08 ` Rik van Riel
2015-04-23 17:28 ` Rik van Riel
2015-04-23 2:27 ` Benjamin Herrenschmidt
2015-04-23 14:20 ` Christoph Lameter
2015-04-23 16:22 ` Jerome Glisse
2015-04-24 18:41 ` Oded Gabbay
2015-04-23 19:00 ` Paul E. McKenney
2015-04-22 15:20 ` Christoph Lameter
2015-04-25 2:32 ` Rik van Riel
2015-04-25 3:32 ` Benjamin Herrenschmidt
2015-04-25 11:55 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1431560326.20218.94.camel@kernel.crashing.org \
--to=benh@kernel.crashing.org \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=cabuschardt@nvidia.com \
--cc=cl@linux.com \
--cc=ggerfin@nvidia.com \
--cc=j.glisse@gmail.com \
--cc=jglisse@redhat.com \
--cc=jmckenna@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).