Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Simon Jeons <simon.jeons@gmail.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
	Shachar Raindel <raindel@mellanox.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Roland Dreier <roland@purestorage.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>,
	Liran Liss <liranl@mellanox.com>
Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
Date: Fri, 12 Apr 2013 13:44:38 +0800	[thread overview]
Message-ID: <51679F46.7030901@gmail.com> (raw)
In-Reply-To: <CAH3drwYee1mKMPcT5QJNsaGGEvJHNTPFEvndpvS+HkeuwwAYmg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9882 bytes --]

Hi Jerome,
On 04/12/2013 10:57 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com 
> <mailto:simon.jeons@gmail.com>> wrote:
>
>     Hi Jerome,
>
>     On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>
>         On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>
>             Hi Jerome,
>             On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>
>                 On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons
>                 wrote:
>
>                     Hi Jerome,
>                     On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>
>                         On Tue, Apr 09, 2013 at 04:28:09PM +0800,
>                         Simon Jeons wrote:
>
>                             Hi Jerome,
>                             On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
>                                 On Sat, Feb 9, 2013 at 1:05 AM, Michel
>                                 Lespinasse <walken@google.com
>                                 <mailto:walken@google.com>> wrote:
>
>                                     On Fri, Feb 8, 2013 at 3:18 AM,
>                                     Shachar Raindel
>                                     <raindel@mellanox.com
>                                     <mailto:raindel@mellanox.com>> wrote:
>
>                                         Hi,
>
>                                         We would like to present a
>                                         reference implementation for
>                                         safely sharing
>                                         memory pages from user space
>                                         with the hardware, without
>                                         pinning.
>
>                                         We will be happy to hear the
>                                         community feedback on our
>                                         prototype
>                                         implementation, and
>                                         suggestions for future
>                                         improvements.
>
>                                         We would also like to discuss
>                                         adding features to the core MM
>                                         subsystem to
>                                         assist hardware access to user
>                                         memory without pinning.
>
>                                     This sounds kinda scary TBH;
>                                     however I do understand the need
>                                     for such
>                                     technology.
>
>                                     I think one issue is that many MM
>                                     developers are insufficiently aware
>                                     of such developments; having a
>                                     technology presentation would probably
>                                     help there; but traditionally
>                                     LSF/MM sessions are more interactive
>                                     between developers who are already
>                                     quite familiar with the technology.
>                                     I think it would help if you could
>                                     send in advance a detailed
>                                     presentation of the problem and
>                                     the proposed solutions (and then what
>                                     they require of the MM layer) so
>                                     people can be better prepared.
>
>                                     And first I'd like to ask, aren't
>                                     IOMMUs supposed to already largely
>                                     solve this problem ? (probably a
>                                     dumb question, but that just tells
>                                     you how much you need to explain :)
>
>                                 For GPU the motivation is three fold.
>                                 With the advance of GPU compute
>                                 and also with newer graphic program we
>                                 see a massive increase in GPU
>                                 memory consumption. We easily can
>                                 reach buffer that are bigger than
>                                 1gbytes. So the first motivation is to
>                                 directly use the memory the
>                                 user allocated through malloc in the
>                                 GPU this avoid copying 1gbytes of
>                                 data with the cpu to the gpu buffer.
>                                 The second and mostly important
>                                 to GPU compute is the use of GPU
>                                 seamlessly with the CPU, in order to
>                                 achieve this you want the programmer
>                                 to have a single address space on
>                                 the CPU and GPU. So that the same
>                                 address point to the same object on
>                                 GPU as on the CPU. This would also be
>                                 a tremendous cleaner design from
>                                 driver point of view toward memory
>                                 management.
>
>                                 And last, the most important, with
>                                 such big buffer (>1gbytes) the
>                                 memory pinning is becoming way to
>                                 expensive and also drastically
>                                 reduce the freedom of the mm to free
>                                 page for other process. Most of
>                                 the time a small window (every thing
>                                 is relative the window can be >
>                                 100mbytes not so small :)) of the
>                                 object will be in use by the
>                                 hardware. The hardware pagefault
>                                 support would avoid the necessity to
>
>                             What's the meaning of hardware pagefault?
>
>                         It's a PCIE extension (well it's a combination
>                         of extension that allow
>                         that see
>                         http://www.pcisig.com/specifications/iov/ats/). Idea
>                         is that the
>                         iommu can trigger a regular pagefault inside a
>                         process address space on
>                         behalf of the hardware. The only iommu
>                         supporting that right now is the
>                         AMD iommu v2 that you find on recent AMD platform.
>
>                     Why need hardware page fault? regular page fault
>                     is trigger by cpu
>                     mmu, correct?
>
>                 Well here i abuse regular page fault term. Idea is
>                 that with hardware page
>                 fault you don't need to pin memory or take reference
>                 on page for hardware to
>                 use it. So that kernel can free as usual page that
>                 would otherwise have been
>
>             For the case when GPU need to pin memory, why GPU need
>             grap the
>             memory of normal process instead of allocating for itself?
>
>         Pin memory is today world where gpu allocate its own memory
>         (GB of memory)
>         that disappear from kernel control ie kernel can no longer
>         reclaim this
>         memory it's lost memory (i had complain about that already
>         from user than
>         saw GB of memory vanish and couldn't understand why the GPU
>         was using so
>         much).
>
>         Tomorrow world we want gpu to be able to access memory that
>         the application
>         allocated through a simple malloc and we want the kernel to be
>         able to
>         recycly any page at any time because of memory pressure or
>         because kernel
>         decide to do so.
>
>         That's just what we want to do. To achieve so we are getting
>         hw that can do
>         pagefault. No change to kernel core mm code (some improvement
>         might be made).
>
>
>     The memory disappear since you have a reference(gup) against it,
>     correct? Tomorrow world you want the page fault trigger through
>     iommu driver that call get_user_pages, it also will take a
>     reference(since gup is called), isn't it? Anyway, assume tomorrow
>     world doesn't take a reference, we don't need care page which used
>     by GPU is reclaimed?
>
>
> Right now code use gup because it's convenient but it drop the 
> reference right after the fault. So reference is hold only for short 
> period of time.

Are you sure gup will drop the reference right after the fault? I redig 
the codes and fail verify it. Could you point out to me?

>
> No you don't need to care about reclaim thanks to mmu notifier, ie 
> before page is remove mmu notifier is call and iommu register a 
> notifier, so it get the invalidate event and invalidate the device tlb 
> and things goes on. If gpu access the page a new pagefault happen and 
> a new page is allocated.

Good idea! ;-)

>
> All this code is upstream in linux kernel just read it. There is just 
> no device that use it yet.
>
> That being said we will want improvement so that page that are hot in 
> the device are not reclaimed. But it can work without such improvement.
>
> Cheers,
> Jerome


[-- Attachment #2: Type: text/html, Size: 13455 bytes --]

next prev parent reply	other threads:[~2013-04-12  5:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
2013-02-08 15:21 ` Jerome Glisse
2013-04-16  7:03   ` Simon Jeons
2013-04-16 16:27     ` Jerome Glisse
2013-04-16 23:50       ` Simon Jeons
2013-04-17 14:01         ` Jerome Glisse
2013-04-17 23:48           ` Simon Jeons
2013-04-18  1:02             ` Jerome Glisse
2013-02-09  6:05 ` Michel Lespinasse
2013-02-09 16:29   ` Jerome Glisse
2013-04-09  8:28     ` Simon Jeons
2013-04-09 14:21       ` Jerome Glisse
2013-04-10  1:41         ` Simon Jeons
2013-04-10 20:45           ` Jerome Glisse
2013-04-11  3:42             ` Simon Jeons
2013-04-11 18:38               ` Jerome Glisse
2013-04-12  1:54                 ` Simon Jeons
2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
2013-04-12  2:57                   ` Jerome Glisse
2013-04-12  5:44                     ` Simon Jeons [this message]
2013-04-12 13:32                       ` Jerome Glisse
2013-04-10  1:57     ` Simon Jeons
2013-04-10 20:55       ` Jerome Glisse
2013-04-11  3:37         ` Simon Jeons
2013-04-11 18:48           ` Jerome Glisse
2013-04-12  3:13             ` Simon Jeons
2013-04-12  3:21               ` Jerome Glisse
2013-04-15  8:39     ` Simon Jeons
2013-04-15 15:38       ` Jerome Glisse
2013-04-16  4:20         ` Simon Jeons
2013-04-16 16:19           ` Jerome Glisse
2013-02-10  7:54   ` Shachar Raindel
2013-04-09  8:17 ` Simon Jeons
2013-04-10  1:48   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51679F46.7030901@gmail.com \
    --to=simon.jeons@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=j.glisse@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ogerlitz@mellanox.com \
    --cc=raindel@mellanox.com \
    --cc=roland@purestorage.com \
    --cc=sagig@mellanox.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).