Re: [RFC summary] Enable Coherent Device Memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: "Balbir Singh" <bsingharora@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Anshuman Khandual" <khandual@linux.vnet.ibm.com>,
	"Aneesh Kumar KV" <aneesh.kumar@linux.vnet.ibm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	"Srikar Dronamraju" <srikar@linux.vnet.ibm.com>,
	"Haren Myneni" <haren@linux.vnet.ibm.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Reza Arbab" <arbab@linux.vnet.ibm.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Christoph Lameter" <cl@linux.com>,
	"Rik van Riel" <riel@redhat.com>
Subject: Re: [RFC summary] Enable Coherent Device Memory
Date: Wed, 17 May 2017 19:03:46 +1000	[thread overview]
Message-ID: <1495011826.3092.18.camel@kernel.crashing.org> (raw)
In-Reply-To: <20170517082836.whe3hggeew23nwvz@techsingularity.net>

On Wed, 2017-05-17 at 09:28 +0100, Mel Gorman wrote:
> On Wed, May 17, 2017 at 08:26:47AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-05-16 at 09:43 +0100, Mel Gorman wrote:
> > > I'm not sure what you're asking here. migration is only partially
> > > transparent but a move_pages call will be necessary to force pages onto
> > > CDM if binding policies are not used so the cost of migration will be
> > > invisible. Even if you made it "transparent", the migration cost would
> > > be incurred at fault time. If anything, using move_pages would be more
> > > predictable as you control when the cost is incurred.
> > 
> > One of the main point of this whole exercise is for applications to not
> > have to bother with any of this and now you are bringing all back into
> > their lap.
> > 
> > The base idea behind the counters we have on the link is for the HW to
> > know when memory is accessed "remotely", so that the device driver can
> > make decision about migrating pages into or away from the device,
> > especially so that applications don't have to concern themselves with
> > memory placement.
> > 
> 
> There is only so much magic that can be applied and if the manual case
> cannot be handled then the automatic case is problematic. You say that you
> want kswapd disabled, but have nothing to handle overcommit sanely.

I am not certain we want kswapd disabled, that is definitely more of a
userspace policy, I agree. It could be in this case that it should
prioritize different pages but still be able to push out. We *do* have
age counting etc... just less efficient / higher cost. 

>  You
> want to disable automatic NUMA balancing yet also be able to automatically
> detect when data should move from CDM (automatic NUMA balancing by design
> couldn't move data to CDM without driver support tracking GPU accesses).

We can, via a driver specific hook, since we have specific counters on
the link, so we don't want the autonuma based approach which makes PTEs
inaccessible.

> To handle it transparently, either the driver needs to do the work in which
> case no special core-kernel support is needed beyond what already exists or
> there is a userspace daemon like numad running in userspace that decides
> when to trigger migrations on a separate process that is using CDM which
> would need to gather information from the driver.

The driver can handle it, we just need autonuma off the CDM memory (it
can continue operating normally on system memory).

> In either case, the existing isolation mechanisms are still sufficient as
> long as the driver hot-adds the CDM memory from a userspace trigger that
> it then responsible for setting up the isolation.

Yes, I think the NUMA node based approach works fine using a lot of
existing stuff. There are a couple of gaps, which we need to look at
fixing one way or another such as the above, but overall I don't see
the need of some major overhaul, not do I see the need of going down
the path of ZONE_DEVICE.

> All that aside, this series has nothing to do with the type of magic
> you describe and the feedback as iven was "at this point, what you are
> looking for does not require special kernel support or heavy wiring into
> the core vm".
> 
> > Thus we want to reply on the GPU driver moving the pages around where
> > most appropriate (where they are being accessed, either core memory or
> > GPU memory) based on inputs from the HW counters monitoring the link.
> > 
> 
> And if the driver is polling all the accesses, there are still no changes
> required to the core vm as long as the driver does the hotplug and allows
> userspace to isolate if that is what the applications desire.

With one main exception ... 

We also do want normal allocations to avoid going to the GPU memory.

IE, things should go to the GPU memory if and only if they are either
explicitly put there by the application/driver (the case where
applications do care about manual placement), or the migration case.A 

The latter is triggered by the driver, so it's also a case of the
driver allocating the GPU pages and doing a migration to them.

This is the key thing. Now creating a CMA or using ZONE_MOVABLE can
handle at least keeping kernel allocations off the GPU. However we
would also like to keep random unrelated user memory & page cache off
as well.

There are various reasons for that, some related to the fact that the
performance characteristics of that memory (ie latency) could cause
nasty surprises for normal applications, some related to the fact that
this memory is rather unreliable compared to system memory...

Cheers,
Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-05-17  9:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12  6:18 [RFC summary] Enable Coherent Device Memory Balbir Singh
2017-05-12 10:26 ` Mel Gorman
2017-05-15 23:45   ` Balbir Singh
2017-05-16  8:43     ` Mel Gorman
2017-05-16 22:26       ` Benjamin Herrenschmidt
2017-05-17  8:28         ` Mel Gorman
2017-05-17  9:03           ` Benjamin Herrenschmidt [this message]
2017-05-17  9:15             ` Mel Gorman
2017-05-17  9:56               ` Benjamin Herrenschmidt
2017-05-17 10:58                 ` Mel Gorman
2017-05-17 19:35                   ` Benjamin Herrenschmidt
2017-05-17 19:37                   ` Benjamin Herrenschmidt
2017-05-17 12:41                 ` Michal Hocko
2017-05-17 13:54         ` Christoph Lameter
2017-05-17 19:39           ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1495011826.3092.18.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=arbab@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=cl@linux.com \
    --cc=haren@linux.vnet.ibm.com \
    --cc=jglisse@redhat.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).