Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <jglisse@redhat.com>
To: Bob Liu <lliubbo@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5
Date: Fri, 21 Jul 2017 11:21:07 -0400	[thread overview]
Message-ID: <20170721152107.GA3202@redhat.com> (raw)
In-Reply-To: <CAA_GA1du0qd8b8Eq2yVeULo6TxXw2YckABWiwY8RO5N7FB+Z=A@mail.gmail.com>

On Fri, Jul 21, 2017 at 08:01:07PM +0800, Bob Liu wrote:
> On Fri, Jul 21, 2017 at 10:10 AM, Bob Liu <liubo95@huawei.com> wrote:
> > On 2017/7/21 9:41, Jerome Glisse wrote:
> >> On Fri, Jul 21, 2017 at 09:15:29AM +0800, Bob Liu wrote:
> >>> On 2017/7/20 23:03, Jerome Glisse wrote:
> >>>> On Wed, Jul 19, 2017 at 05:09:04PM +0800, Bob Liu wrote:
> >>>>> On 2017/7/19 10:25, Jerome Glisse wrote:
> >>>>>> On Wed, Jul 19, 2017 at 09:46:10AM +0800, Bob Liu wrote:
> >>>>>>> On 2017/7/18 23:38, Jerome Glisse wrote:
> >>>>>>>> On Tue, Jul 18, 2017 at 11:26:51AM +0800, Bob Liu wrote:
> >>>>>>>>> On 2017/7/14 5:15, Jerome Glisse wrote:
> >>
> >> [...]
> >>
> >>>>> Then it's more like replace the numa node solution(CDM) with ZONE_DEVICE
> >>>>> (type MEMORY_DEVICE_PUBLIC). But the problem is the same, e.g how to make
> >>>>> sure the device memory say HBM won't be occupied by normal CPU allocation.
> >>>>> Things will be more complex if there are multi GPU connected by nvlink
> >>>>> (also cache coherent) in a system, each GPU has their own HBM.
> >>>>>
> >>>>> How to decide allocate physical memory from local HBM/DDR or remote HBM/
> >>>>> DDR?
> >>>>>
> >>>>> If using numa(CDM) approach there are NUMA mempolicy and autonuma mechanism
> >>>>> at least.
> >>>>
> >>>> NUMA is not as easy as you think. First like i said we want the device
> >>>> memory to be isolated from most existing mm mechanism. Because memory
> >>>> is unreliable and also because device might need to be able to evict
> >>>> memory to make contiguous physical memory allocation for graphics.
> >>>>
> >>>
> >>> Right, but we need isolation any way.
> >>> For hmm-cdm, the isolation is not adding device memory to lru list, and many
> >>> if (is_device_public_page(page)) ...
> >>>
> >>> But how to evict device memory?
> >>
> >> What you mean by evict ? Device driver can evict whenever they see the need
> >> to do so. CPU page fault will evict too. Process exit or munmap() will free
> >> the device memory.
> >>
> >> Are you refering to evict in the sense of memory reclaim under pressure ?
> >>
> >> So the way it flows for memory pressure is that if device driver want to
> >> make room it can evict stuff to system memory and if there is not enough
> >
> > Yes, I mean this.
> > So every driver have to maintain their own LRU-similar list instead of
> > reuse what already in linux kernel.

Regarding LRU it is again not as easy. First we do necessarily have access
information like CPU page table for device page table. Second the mmu_notifier
callback on per page basis is costly. Finaly device are use differently than
CPU, usualy you schedule a job and once that job is done you can safely evict
memory it was using. Existing device driver already have quite large memory
management code of their own because of that different usage model.

LRU might make sense at one point but so far i doubt it is the right solution
for device memory.

> 
> And how HMM-CDM can handle multiple devices or device with multiple
> device memories(may with different properties also)?
> This kind of hardware platform would be very common when CCIX is out soon.

A) Multiple device is under control of device driver. Multiple devices link
to each other through dedicated link can have themself a complex topology and
remote access between device is highly tie to the device (how to program the
device mmu and device registers) and thus to the device driver.

If we identify common design pattern between different hardware then we might
start thinking about factoring out some common code to help those cases.

B) Multiple different device is an harder problem. Each device provide their
own userspace API and that is through that API that you will get memory
placement advise. If several device fight for placement of same chunk of
memory one can argue that the application is broken or device is broken.
But for now we assume that device and application will behave.

Rate limiting migration is hard, you need to keep migration statistics and
that need memory. So unless we really need to do that i would rather avoid
doing that. Again this is a thing for which we will have to wait and see
how thing panout.

Maybe i should stress that HMM is a set of helpers for device memory and it
is not intended to be a policy maker or to manage device memory. Intention
is that device driver will keep managing device memory as they already do
today.

A deeper integration with process memory management is probably bound to
happen but for now it is just about having toolbox for device driver.

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-07-21 15:21 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 21:15 [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Jérôme Glisse
2017-07-13 21:15 ` [PATCH 1/6] mm/zone-device: rename DEVICE_PUBLIC to DEVICE_HOST Jérôme Glisse
2017-07-17  9:09   ` Balbir Singh
2017-07-13 21:15 ` [PATCH 2/6] mm/device-public-memory: device memory cache coherent with CPU v4 Jérôme Glisse
2017-07-13 23:01   ` Balbir Singh
2017-07-13 21:15 ` [PATCH 3/6] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-07-13 21:15 ` [PATCH 4/6] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-07-17  9:10   ` Balbir Singh
2017-07-13 21:15 ` [PATCH 5/6] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_PUBLIC v3 Jérôme Glisse
2017-07-17  9:15   ` Balbir Singh
2017-07-13 21:15 ` [PATCH 6/6] mm/hmm: documents how device memory is accounted in rss and memcg Jérôme Glisse
2017-07-14 13:26   ` Michal Hocko
2017-07-18  3:26 ` [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Bob Liu
2017-07-18 15:38   ` Jerome Glisse
2017-07-19  1:46     ` Bob Liu
2017-07-19  2:25       ` Jerome Glisse
2017-07-19  9:09         ` Bob Liu
2017-07-20 15:03           ` Jerome Glisse
2017-07-21  1:15             ` Bob Liu
2017-07-21  1:41               ` Jerome Glisse
2017-07-21  2:10                 ` Bob Liu
2017-07-21 12:01                   ` Bob Liu
2017-07-21 15:21                     ` Jerome Glisse [this message]
2017-07-21  3:48                 ` Dan Williams
2017-07-21 15:22                   ` Jerome Glisse
2017-09-05 19:36                   ` Jerome Glisse
2017-09-09 23:22                     ` Bob Liu
2017-09-11 23:36                       ` Jerome Glisse
2017-09-12  1:02                         ` Bob Liu
2017-09-12 16:17                           ` Jerome Glisse
2017-09-26  9:56                         ` Bob Liu
2017-09-26 16:16                           ` Jerome Glisse
2017-09-30  2:57                             ` Bob Liu
2017-09-30 22:49                               ` Jerome Glisse
2017-10-11 13:15                                 ` Bob Liu
2017-10-12 15:37                                   ` Jerome Glisse
2017-11-16  2:10                                     ` chet l
2017-11-16  2:44                                       ` Jerome Glisse
2017-11-16  3:23                                         ` chetan L
2017-11-16  3:29                                           ` chetan L
2017-11-16 21:29                                             ` Jerome Glisse
2017-11-16 22:41                                               ` chetan L
2017-11-16 23:11                                                 ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170721152107.GA3202@redhat.com \
    --to=jglisse@redhat.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liubo95@huawei.com \
    --cc=lliubbo@gmail.com \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).