From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org,
"Linus Torvalds" <torvalds@linux-foundation.org>,
joro@8bytes.org, "Mel Gorman" <mgorman@suse.de>,
"H. Peter Anvin" <hpa@zytor.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Johannes Weiner" <jweiner@redhat.com>,
"Larry Woodman" <lwoodman@redhat.com>,
"Rik van Riel" <riel@redhat.com>,
"Dave Airlie" <airlied@redhat.com>,
"Brendan Conoboy" <blc@redhat.com>,
"Joe Donohue" <jdonohue@redhat.com>,
"Christophe Harle" <charle@nvidia.com>,
"Duncan Poole" <dpoole@nvidia.com>,
"Sherry Cheung" <SCheung@nvidia.com>,
"Subhash Gutti" <sgutti@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Mark Hairgrove" <mhairgrove@nvidia.com>,
"Lucien Dunning" <ldunning@nvidia.com>,
"Cameron Buschardt" <cabuschardt@nvidia.com>,
"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
"Haggai Eran" <haggaie@mellanox.com>,
"Liran Liss" <liranl@mellanox.com>,
"Roland Dreier" <roland@purestorage.com>,
"Ben Sander" <ben.sander@amd.com>,
"Greg Stoner" <Greg.Stoner@amd.com>,
"John Bridgman" <John.Bridgman@amd.com>,
"Michael Mantor" <Michael.Mantor@amd.com>,
"Paul Blinzer" <Paul.Blinzer@amd.com>,
"Leonid Shamis" <Leonid.Shamis@amd.com>,
"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
"Alexander Deucher" <Alexander.Deucher@amd.com>,
"Jatin Kumar" <jakumar@nvidia.com>
Subject: Re: [PATCH v12 08/29] HMM: add device page fault support v6.
Date: Wed, 23 Mar 2016 15:59:32 +0530 [thread overview]
Message-ID: <87egb1trlf.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160323100919.GA2888@gmail.com>
Jerome Glisse <j.glisse@gmail.com> writes:
> [ text/plain ]
> On Wed, Mar 23, 2016 at 12:22:23PM +0530, Aneesh Kumar K.V wrote:
>> Jérôme Glisse <jglisse@redhat.com> writes:
>>
>> > [ text/plain ]
>> > This patch add helper for device page fault. Thus helpers will fill
>> > the mirror page table using the CPU page table and synchronizing
>> > with any update to CPU page table.
>> >
>> > Changed since v1:
>> > - Add comment about directory lock.
>> >
>> > Changed since v2:
>> > - Check for mirror->hmm in hmm_mirror_fault()
>> >
>> > Changed since v3:
>> > - Adapt to HMM page table changes.
>> >
>> > Changed since v4:
>> > - Fix PROT_NONE, ie do not populate from protnone pte.
>> > - Fix huge pmd handling (start address may != pmd start address)
>> > - Fix missing entry case.
>> >
>> > Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
>> > Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
>> > Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
>> > Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
>> > Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> > Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
>> > ---
>>
>>
>> ....
>> ....
>>
>> +static int hmm_mirror_fault_hpmd(struct hmm_mirror *mirror,
>> > + struct hmm_event *event,
>> > + struct vm_area_struct *vma,
>> > + struct hmm_pt_iter *iter,
>> > + pmd_t *pmdp,
>> > + struct hmm_mirror_fault *mirror_fault,
>> > + unsigned long start,
>> > + unsigned long end)
>> > +{
>> > + struct page *page;
>> > + unsigned long addr, pfn;
>> > + unsigned flags = FOLL_TOUCH;
>> > + spinlock_t *ptl;
>> > + int ret;
>> > +
>> > + ptl = pmd_lock(mirror->hmm->mm, pmdp);
>> > + if (unlikely(!pmd_trans_huge(*pmdp))) {
>> > + spin_unlock(ptl);
>> > + return -EAGAIN;
>> > + }
>> > + flags |= event->etype == HMM_DEVICE_WFAULT ? FOLL_WRITE : 0;
>> > + page = follow_trans_huge_pmd(vma, start, pmdp, flags);
>> > + pfn = page_to_pfn(page);
>> > + spin_unlock(ptl);
>> > +
>> > + /* Just fault in the whole PMD. */
>> > + start &= PMD_MASK;
>> > + end = start + PMD_SIZE - 1;
>> > +
>> > + if (!pmd_write(*pmdp) && event->etype == HMM_DEVICE_WFAULT)
>> > + return -ENOENT;
>> > +
>> > + for (ret = 0, addr = start; !ret && addr < end;) {
>> > + unsigned long i, next = end;
>> > + dma_addr_t *hmm_pte;
>> > +
>> > + hmm_pte = hmm_pt_iter_populate(iter, addr, &next);
>> > + if (!hmm_pte)
>> > + return -ENOMEM;
>> > +
>> > + i = hmm_pt_index(&mirror->pt, addr, mirror->pt.llevel);
>> > +
>> > + /*
>> > + * The directory lock protect against concurrent clearing of
>> > + * page table bit flags. Exceptions being the dirty bit and
>> > + * the device driver private flags.
>> > + */
>> > + hmm_pt_iter_directory_lock(iter);
>> > + do {
>> > + if (!hmm_pte_test_valid_pfn(&hmm_pte[i])) {
>> > + hmm_pte[i] = hmm_pte_from_pfn(pfn);
>> > + hmm_pt_iter_directory_ref(iter);
>>
>> I looked at that and it is actually
>> static inline void hmm_pt_iter_directory_ref(struct hmm_pt_iter *iter)
>> {
>> BUG_ON(!iter->ptd[iter->pt->llevel - 1]);
>> hmm_pt_directory_ref(iter->pt, iter->ptd[iter->pt->llevel - 1]);
>> }
>>
>> static inline void hmm_pt_directory_ref(struct hmm_pt *pt,
>> struct page *ptd)
>> {
>> if (!atomic_inc_not_zero(&ptd->_mapcount))
>> /* Illegal this should not happen. */
>> BUG();
>> }
>>
>> what is the mapcount update about ?
>
> Unlike regular CPU page table we do not rely on unmap to prune HMM mirror
> page table. Rather we free/prune it aggressively once the device no longer
> have anything mirror in a given range.
Which patch does this ?
>
> As such mapcount is use to keep track of any many valid entry there is per
> directory.
>
> Moreover mapcount is also use to protect from concurrent pruning when
> you walk through the page table you increment refcount by one along your
> way. When you done walking you decrement refcount.
>
> Because of that last aspect, the mapcount can never reach zero because we
> unmap page, it can only reach zero once we cleanup the page table walk.
>
>>
>> > + }
>> > + BUG_ON(hmm_pte_pfn(hmm_pte[i]) != pfn);
>> > + if (pmd_write(*pmdp))
>> > + hmm_pte_set_write(&hmm_pte[i]);
>> > + } while (addr += PAGE_SIZE, pfn++, i++, addr != next);
>> > + hmm_pt_iter_directory_unlock(iter);
>> > + mirror_fault->addr = addr;
>> > + }
>> > +
>>
>> So we don't have huge page mapping in hmm page table ?
>
> No we don't right now. First reason is that i wanted to keep things simple for
> device driver. Second motivation is to keep first patchset simpler especialy
> the page migration code.
>
> Memory overhead is 2MB per GB of virtual memory mirrored. There is no TLB here.
> I believe adding huge page can be done as part of a latter patchset if it makes
> sense.
>
One of the thing I am wondering is can we do the patch series in such a
way that we move the page table mirror to device driver. That is an
hmm fault will look at cpu page table and call into a device driver callback
with the pte entry details. It is upto the device driver to maintain a
mirror table if needed. Similarly for cpu fault we call into hmm
callback to find per pte dma_addr and do a migrate using
copy_from_device callback. I haven't fully looked at how easy this would
be, but I guess lot of the code in this series got to do with mirror
table and I wondering is there a simpler version we can get upstream
that hides it within a driver.
Also does it simply to have interfaces that operates on one pte than an
array of ptes ?
-aneesh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-03-23 10:29 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 20:42 HMM (Heterogeneous Memory Management) Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 01/29] mmu_notifier: add event information to address invalidation v9 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 02/29] mmu_notifier: keep track of active invalidation ranges v5 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 03/29] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 04/29] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 05/29] HMM: introduce heterogeneous memory management v5 Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 06/29] HMM: add HMM page table v4 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 07/29] HMM: add per mirror " Jérôme Glisse
2016-03-29 22:58 ` John Hubbard
2016-03-08 20:43 ` [PATCH v12 08/29] HMM: add device page fault support v6 Jérôme Glisse
2016-03-23 6:52 ` Aneesh Kumar K.V
2016-03-23 10:09 ` Jerome Glisse
2016-03-23 10:29 ` Aneesh Kumar K.V [this message]
2016-03-23 11:25 ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 09/29] HMM: add mm page table iterator helpers Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 10/29] HMM: use CPU page table during invalidation Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 11/29] HMM: add discard range helper (to clear and free resources for a range) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 12/29] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 13/29] HMM: DMA map memory on behalf of device driver v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 14/29] HMM: Add support for hugetlb Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 15/29] HMM: add documentation explaining HMM internals and how to use it Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 16/29] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 17/29] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 18/29] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 19/29] HMM: add new HMM page table flag (select flag) Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 20/29] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 21/29] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
2016-03-21 11:27 ` Aneesh Kumar K.V
2016-03-21 12:02 ` Jerome Glisse
2016-03-21 13:48 ` Aneesh Kumar K.V
2016-03-21 14:30 ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 22/29] HMM: mm add helper to update page table when migrating memory v3 Jérôme Glisse
2016-03-21 14:24 ` Aneesh Kumar K.V
2016-03-08 20:43 ` [PATCH v12 23/29] HMM: new callback for copying memory from and to device memory v2 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 24/29] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 25/29] HMM: split DMA mapping function in two Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 26/29] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 27/29] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 28/29] HMM: CPU page fault on migrated memory Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 29/29] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
2016-03-08 22:02 ` HMM (Heterogeneous Memory Management) John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87egb1trlf.fsf@linux.vnet.ibm.com \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=Alexander.Deucher@amd.com \
--cc=Greg.Stoner@amd.com \
--cc=John.Bridgman@amd.com \
--cc=Laurent.Morichetti@amd.com \
--cc=Leonid.Shamis@amd.com \
--cc=Michael.Mantor@amd.com \
--cc=Paul.Blinzer@amd.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arvindg@nvidia.com \
--cc=ben.sander@amd.com \
--cc=blc@redhat.com \
--cc=cabuschardt@nvidia.com \
--cc=charle@nvidia.com \
--cc=dpoole@nvidia.com \
--cc=haggaie@mellanox.com \
--cc=hpa@zytor.com \
--cc=j.glisse@gmail.com \
--cc=jakumar@nvidia.com \
--cc=jdonohue@redhat.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=joro@8bytes.org \
--cc=jweiner@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liranl@mellanox.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=roland@purestorage.com \
--cc=sgutti@nvidia.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).