From: Simon Jeons <simon.jeons@gmail.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
Shachar Raindel <raindel@mellanox.com>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
Andrea Arcangeli <aarcange@redhat.com>,
Roland Dreier <roland@purestorage.com>,
Haggai Eran <haggaie@mellanox.com>,
Or Gerlitz <ogerlitz@mellanox.com>,
Sagi Grimberg <sagig@mellanox.com>,
Liran Liss <liranl@mellanox.com>
Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
Date: Fri, 12 Apr 2013 13:44:38 +0800 [thread overview]
Message-ID: <51679F46.7030901@gmail.com> (raw)
In-Reply-To: <CAH3drwYee1mKMPcT5QJNsaGGEvJHNTPFEvndpvS+HkeuwwAYmg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 9882 bytes --]
Hi Jerome,
On 04/12/2013 10:57 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com
> <mailto:simon.jeons@gmail.com>> wrote:
>
> Hi Jerome,
>
> On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>
> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>
> Hi Jerome,
> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>
> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons
> wrote:
>
> Hi Jerome,
> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>
> On Tue, Apr 09, 2013 at 04:28:09PM +0800,
> Simon Jeons wrote:
>
> Hi Jerome,
> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
> On Sat, Feb 9, 2013 at 1:05 AM, Michel
> Lespinasse <walken@google.com
> <mailto:walken@google.com>> wrote:
>
> On Fri, Feb 8, 2013 at 3:18 AM,
> Shachar Raindel
> <raindel@mellanox.com
> <mailto:raindel@mellanox.com>> wrote:
>
> Hi,
>
> We would like to present a
> reference implementation for
> safely sharing
> memory pages from user space
> with the hardware, without
> pinning.
>
> We will be happy to hear the
> community feedback on our
> prototype
> implementation, and
> suggestions for future
> improvements.
>
> We would also like to discuss
> adding features to the core MM
> subsystem to
> assist hardware access to user
> memory without pinning.
>
> This sounds kinda scary TBH;
> however I do understand the need
> for such
> technology.
>
> I think one issue is that many MM
> developers are insufficiently aware
> of such developments; having a
> technology presentation would probably
> help there; but traditionally
> LSF/MM sessions are more interactive
> between developers who are already
> quite familiar with the technology.
> I think it would help if you could
> send in advance a detailed
> presentation of the problem and
> the proposed solutions (and then what
> they require of the MM layer) so
> people can be better prepared.
>
> And first I'd like to ask, aren't
> IOMMUs supposed to already largely
> solve this problem ? (probably a
> dumb question, but that just tells
> you how much you need to explain :)
>
> For GPU the motivation is three fold.
> With the advance of GPU compute
> and also with newer graphic program we
> see a massive increase in GPU
> memory consumption. We easily can
> reach buffer that are bigger than
> 1gbytes. So the first motivation is to
> directly use the memory the
> user allocated through malloc in the
> GPU this avoid copying 1gbytes of
> data with the cpu to the gpu buffer.
> The second and mostly important
> to GPU compute is the use of GPU
> seamlessly with the CPU, in order to
> achieve this you want the programmer
> to have a single address space on
> the CPU and GPU. So that the same
> address point to the same object on
> GPU as on the CPU. This would also be
> a tremendous cleaner design from
> driver point of view toward memory
> management.
>
> And last, the most important, with
> such big buffer (>1gbytes) the
> memory pinning is becoming way to
> expensive and also drastically
> reduce the freedom of the mm to free
> page for other process. Most of
> the time a small window (every thing
> is relative the window can be >
> 100mbytes not so small :)) of the
> object will be in use by the
> hardware. The hardware pagefault
> support would avoid the necessity to
>
> What's the meaning of hardware pagefault?
>
> It's a PCIE extension (well it's a combination
> of extension that allow
> that see
> http://www.pcisig.com/specifications/iov/ats/). Idea
> is that the
> iommu can trigger a regular pagefault inside a
> process address space on
> behalf of the hardware. The only iommu
> supporting that right now is the
> AMD iommu v2 that you find on recent AMD platform.
>
> Why need hardware page fault? regular page fault
> is trigger by cpu
> mmu, correct?
>
> Well here i abuse regular page fault term. Idea is
> that with hardware page
> fault you don't need to pin memory or take reference
> on page for hardware to
> use it. So that kernel can free as usual page that
> would otherwise have been
>
> For the case when GPU need to pin memory, why GPU need
> grap the
> memory of normal process instead of allocating for itself?
>
> Pin memory is today world where gpu allocate its own memory
> (GB of memory)
> that disappear from kernel control ie kernel can no longer
> reclaim this
> memory it's lost memory (i had complain about that already
> from user than
> saw GB of memory vanish and couldn't understand why the GPU
> was using so
> much).
>
> Tomorrow world we want gpu to be able to access memory that
> the application
> allocated through a simple malloc and we want the kernel to be
> able to
> recycly any page at any time because of memory pressure or
> because kernel
> decide to do so.
>
> That's just what we want to do. To achieve so we are getting
> hw that can do
> pagefault. No change to kernel core mm code (some improvement
> might be made).
>
>
> The memory disappear since you have a reference(gup) against it,
> correct? Tomorrow world you want the page fault trigger through
> iommu driver that call get_user_pages, it also will take a
> reference(since gup is called), isn't it? Anyway, assume tomorrow
> world doesn't take a reference, we don't need care page which used
> by GPU is reclaimed?
>
>
> Right now code use gup because it's convenient but it drop the
> reference right after the fault. So reference is hold only for short
> period of time.
Are you sure gup will drop the reference right after the fault? I redig
the codes and fail verify it. Could you point out to me?
>
> No you don't need to care about reclaim thanks to mmu notifier, ie
> before page is remove mmu notifier is call and iommu register a
> notifier, so it get the invalidate event and invalidate the device tlb
> and things goes on. If gpu access the page a new pagefault happen and
> a new page is allocated.
Good idea! ;-)
>
> All this code is upstream in linux kernel just read it. There is just
> no device that use it yet.
>
> That being said we will want improvement so that page that are hot in
> the device are not reclaimed. But it can work without such improvement.
>
> Cheers,
> Jerome
[-- Attachment #2: Type: text/html, Size: 13455 bytes --]
next prev parent reply other threads:[~2013-04-12 5:44 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
2013-02-08 15:21 ` Jerome Glisse
2013-04-16 7:03 ` Simon Jeons
2013-04-16 16:27 ` Jerome Glisse
2013-04-16 23:50 ` Simon Jeons
2013-04-17 14:01 ` Jerome Glisse
2013-04-17 23:48 ` Simon Jeons
2013-04-18 1:02 ` Jerome Glisse
2013-02-09 6:05 ` Michel Lespinasse
2013-02-09 16:29 ` Jerome Glisse
2013-04-09 8:28 ` Simon Jeons
2013-04-09 14:21 ` Jerome Glisse
2013-04-10 1:41 ` Simon Jeons
2013-04-10 20:45 ` Jerome Glisse
2013-04-11 3:42 ` Simon Jeons
2013-04-11 18:38 ` Jerome Glisse
2013-04-12 1:54 ` Simon Jeons
2013-04-12 2:11 ` [Lsf-pc] " Rik van Riel
2013-04-12 2:57 ` Jerome Glisse
2013-04-12 5:44 ` Simon Jeons [this message]
2013-04-12 13:32 ` Jerome Glisse
2013-04-10 1:57 ` Simon Jeons
2013-04-10 20:55 ` Jerome Glisse
2013-04-11 3:37 ` Simon Jeons
2013-04-11 18:48 ` Jerome Glisse
2013-04-12 3:13 ` Simon Jeons
2013-04-12 3:21 ` Jerome Glisse
2013-04-15 8:39 ` Simon Jeons
2013-04-15 15:38 ` Jerome Glisse
2013-04-16 4:20 ` Simon Jeons
2013-04-16 16:19 ` Jerome Glisse
2013-02-10 7:54 ` Shachar Raindel
2013-04-09 8:17 ` Simon Jeons
2013-04-10 1:48 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51679F46.7030901@gmail.com \
--to=simon.jeons@gmail.com \
--cc=aarcange@redhat.com \
--cc=haggaie@mellanox.com \
--cc=j.glisse@gmail.com \
--cc=linux-mm@kvack.org \
--cc=liranl@mellanox.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ogerlitz@mellanox.com \
--cc=raindel@mellanox.com \
--cc=roland@purestorage.com \
--cc=sagig@mellanox.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).