From: Jerome Glisse <j.glisse@gmail.com>
To: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"joro@8bytes.org" <joro@8bytes.org>,
"Mel Gorman" <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Johannes Weiner" <jweiner@redhat.com>,
"Larry Woodman" <lwoodman@redhat.com>,
"Rik van Riel" <riel@redhat.com>,
"Dave Airlie" <airlied@redhat.com>,
"Brendan Conoboy" <blc@redhat.com>,
"Joe Donohue" <jdonohue@redhat.com>,
"Duncan Poole" <dpoole@nvidia.com>,
"Sherry Cheung" <SCheung@nvidia.com>,
"Subhash Gutti" <sgutti@nvidia.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Lucien Dunning" <ldunning@nvidia.com>,
"Cameron Buschardt" <cabuschardt@nvidia.com>,
"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
"Haggai Eran" <haggaie@mellanox.com>,
"Shachar Raindel" <raindel@mellanox.com>,
"Liran Liss" <liranl@mellanox.com>,
"Roland Dreier" <roland@purestorage.com>,
"Ben Sander" <ben.sander@amd.com>,
"Greg Stoner" <Greg.Stoner@amd.com>,
"John Bridgman" <John.Bridgman@amd.com>,
"Michael Mantor" <Michael.Mantor@amd.com>,
"Paul Blinzer" <Paul.Blinzer@amd.com>,
"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
"Alexander Deucher" <Alexander.Deucher@amd.com>,
"Oded Gabbay" <Oded.Gabbay@amd.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Jatin Kumar" <jakumar@nvidia.com>
Subject: Re: [PATCH 06/36] HMM: add HMM page table v2.
Date: Mon, 29 Jun 2015 10:43:06 -0400 [thread overview]
Message-ID: <20150629144305.GA2173@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1506261827090.20890@mdh-linux64-2.nvidia.com>
On Fri, Jun 26, 2015 at 06:34:16PM -0700, Mark Hairgrove wrote:
>
>
> On Fri, 26 Jun 2015, Jerome Glisse wrote:
>
> > On Thu, Jun 25, 2015 at 03:57:29PM -0700, Mark Hairgrove wrote:
> > > On Thu, 21 May 2015, j.glisse@gmail.com wrote:
> > > > From: Jerome Glisse <jglisse@redhat.com>
> > > > [...]
> > > > +
> > > > +void hmm_pt_iter_init(struct hmm_pt_iter *iter);
> > > > +void hmm_pt_iter_fini(struct hmm_pt_iter *iter, struct hmm_pt *pt);
> > > > +unsigned long hmm_pt_iter_next(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr,
> > > > + unsigned long end);
> > > > +dma_addr_t *hmm_pt_iter_update(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr);
> > > > +dma_addr_t *hmm_pt_iter_fault(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr);
> > >
> > > I've got a few more thoughts on hmm_pt_iter after looking at some of the
> > > later patches. I think I've convinced myself that this patch functionally
> > > works as-is, but I've got some suggestions and questions about the design.
> > >
> > > Right now there are these three major functions:
> > >
> > > 1) hmm_pt_iter_update(addr)
> > > - Returns the hmm_pte * for addr, or NULL if none exists.
> > >
> > > 2) hmm_pt_iter_fault(addr)
> > > - Returns the hmm_pte * for addr, allocating a new one if none exists.
> > >
> > > 3) hmm_pt_iter_next(addr, end)
> > > - Returns the next possibly-valid address. The caller must use
> > > hmm_pt_iter_update to check if there really is an hmm_pte there.
> > >
> > > In my view, there are two sources of confusion here:
> > > - Naming. "update" shares a name with the HMM mirror callback, and it also
> > > implies that the page tables are "updated" as a result of the call.
> > > "fault" likewise implies that the function handles a fault in some way.
> > > Neither of these implications are true.
> >
> > Maybe hmm_pt_iter_walk & hmm_pt_iter_populate are better name ?
>
> hmm_pt_iter_populate sounds good. See below for _walk.
>
>
> >
> >
> > > - hmm_pt_iter_next and hmm_pt_iter_update have some overlapping
> > > functionality when compared to traditional iterators, requiring the
> > > callers to all do this sort of thing:
> > >
> > > hmm_pte = hmm_pt_iter_update(&iter, &mirror->pt, addr);
> > > if (!hmm_pte) {
> > > addr = hmm_pt_iter_next(&iter, &mirror->pt,
> > > addr, event->end);
> > > continue;
> > > }
> > >
> > > Wouldn't it be more efficient and simpler to have _next do all the
> > > iteration internally so it always returns the next valid entry? Then you
> > > could combine _update and _next into a single function, something along
> > > these lines (which also addresses the naming concern):
> > >
> > > void hmm_pt_iter_init(iter, pt, start, end);
> > > unsigned long hmm_pt_iter_next(iter, hmm_pte *);
> > > unsigned long hmm_pt_iter_next_alloc(iter, hmm_pte *);
> > >
> > > hmm_pt_iter_next would return the address and ptep of the next valid
> > > entry, taking the place of the existing _update and _next functions.
> > > hmm_pt_iter_next_alloc takes the place of _fault.
> > >
> > > Also, since the _next functions don't take in an address, the iterator
> > > doesn't have to handle the input addr being different from iter->cur.
> >
> > It would still need to do the same kind of test, this test is really to
> > know when you switch from one directory to the next and to drop and take
> > reference accordingly.
>
> But all of the directory references are already hidden entirely in the
> iterator _update function. The caller only has to worry about taking
> references on the bottom level, so I don't understand why the iterator
> needs to return to the caller when it hits the end of a directory. Or for
> that matter, why it returns every possible index within a directory to the
> caller whether that index is valid or not.
Iterator is what protect against concurrent freeing of the directory so it
has to return to caller on directory boundary (for 64bits arch with 64bits
pte it has return every 512 entries). Otherwise pt_iter_fini() would have
to walk over the whole directory range again just to drop reference and this
doesn't sound like a good idea.
So really with what you are asking it whould be:
hmm_pt_iter_init(&iter, start, end);
for(next=pt_iter_next(&iter,&ptep); next<end; next=pt_iter_next(&iter,&ptep))
{
// Here ptep is valid until next address. Above you have to call
// pt_iter_next() to switch to next directory.
addr = max(start, next - (~HMM_PMD_MASK + 1));
for (; addr < next; addr += PAGE_SIZE, ptep++) {
// access ptep
}
}
My point is that internally pt_iter_next() will do the exact same test it is
doing now btw cur and addr. Just that the addr is no longer explicit but iter
infer it.
> If _next only returned to the caller when it hit a valid hmm_pte (or end),
> then only one function would be needed (_next) instead of two
> (_update/_walk and _next).
On the valid entry side, this is because when you are walking the page table
you have no garanty that the entry will not be clear below you (in case of
concurrent invalidation). The only garanty you have is that if you are able
to read a valid entry from the update() callback then this entry is valid
until you get a new update() callback telling you otherwise.
Cheers,
Jerome
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-29 14:43 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 19:31 HMM (Heterogeneous Memory Management) v8 j.glisse
2015-05-21 19:31 ` [PATCH 01/36] mmu_notifier: add event information to address invalidation v7 j.glisse
2015-05-30 3:43 ` John Hubbard
2015-06-01 19:03 ` Jerome Glisse
2015-06-01 23:10 ` John Hubbard
2015-06-03 16:07 ` Jerome Glisse
2015-06-03 23:02 ` John Hubbard
2015-05-21 19:31 ` [PATCH 02/36] mmu_notifier: keep track of active invalidation ranges v3 j.glisse
2015-05-27 5:09 ` Aneesh Kumar K.V
2015-05-27 14:32 ` Jerome Glisse
2015-06-02 9:32 ` John Hubbard
2015-06-03 17:15 ` Jerome Glisse
2015-06-05 3:29 ` John Hubbard
2015-05-21 19:31 ` [PATCH 03/36] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() j.glisse
2015-05-27 5:17 ` Aneesh Kumar K.V
2015-05-27 14:33 ` Jerome Glisse
2015-06-03 4:25 ` John Hubbard
2015-05-21 19:31 ` [PATCH 04/36] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier j.glisse
2015-05-21 19:31 ` [PATCH 05/36] HMM: introduce heterogeneous memory management v3 j.glisse
2015-05-27 5:50 ` Aneesh Kumar K.V
2015-05-27 14:38 ` Jerome Glisse
2015-06-08 19:40 ` Mark Hairgrove
2015-06-08 21:17 ` Jerome Glisse
2015-06-09 1:54 ` Mark Hairgrove
2015-06-09 15:56 ` Jerome Glisse
2015-06-10 3:33 ` Mark Hairgrove
2015-06-10 15:42 ` Jerome Glisse
2015-06-11 1:15 ` Mark Hairgrove
2015-06-11 14:23 ` Jerome Glisse
2015-06-11 22:26 ` Mark Hairgrove
2015-06-15 14:32 ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 06/36] HMM: add HMM page table v2 j.glisse
2015-06-19 2:06 ` Mark Hairgrove
2015-06-19 18:07 ` Jerome Glisse
2015-06-20 2:34 ` Mark Hairgrove
2015-06-25 22:57 ` Mark Hairgrove
2015-06-26 16:30 ` Jerome Glisse
2015-06-27 1:34 ` Mark Hairgrove
2015-06-29 14:43 ` Jerome Glisse [this message]
2015-07-01 2:51 ` Mark Hairgrove
2015-07-01 15:07 ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 07/36] HMM: add per mirror page table v3 j.glisse
2015-06-25 23:05 ` Mark Hairgrove
2015-06-26 16:43 ` Jerome Glisse
2015-06-27 3:02 ` Mark Hairgrove
2015-06-29 14:50 ` Jerome Glisse
2015-05-21 19:31 ` [PATCH 08/36] HMM: add device page fault support v3 j.glisse
2015-05-21 19:31 ` [PATCH 09/36] HMM: add mm page table iterator helpers j.glisse
2015-05-21 19:31 ` [PATCH 10/36] HMM: use CPU page table during invalidation j.glisse
2015-05-21 19:31 ` [PATCH 11/36] HMM: add discard range helper (to clear and free resources for a range) j.glisse
2015-05-21 19:31 ` [PATCH 12/36] HMM: add dirty range helper (to toggle dirty bit inside mirror page table) j.glisse
2015-05-21 19:31 ` [PATCH 13/36] HMM: DMA map memory on behalf of device driver j.glisse
2015-05-21 19:31 ` [PATCH 14/36] fork: pass the dst vma to copy_page_range() and its sub-functions j.glisse
2015-05-21 19:31 ` [PATCH 15/36] memcg: export get_mem_cgroup_from_mm() j.glisse
2015-05-21 19:31 ` [PATCH 16/36] HMM: add special swap filetype for memory migrated to HMM device memory j.glisse
2015-06-24 7:49 ` Haggai Eran
2015-05-21 19:31 ` [PATCH 17/36] HMM: add new HMM page table flag (valid device memory) j.glisse
2015-05-21 19:31 ` [PATCH 18/36] HMM: add new HMM page table flag (select flag) j.glisse
2015-05-21 19:31 ` [PATCH 19/36] HMM: handle HMM device page table entry on mirror page table fault and update j.glisse
2015-05-21 20:22 ` [PATCH 20/36] HMM: mm add helper to update page table when migrating memory back jglisse
2015-05-21 20:22 ` [PATCH 21/36] HMM: mm add helper to update page table when migrating memory jglisse
2015-05-21 20:22 ` [PATCH 22/36] HMM: add new callback for copying memory from and to device memory jglisse
2015-05-21 20:22 ` [PATCH 23/36] HMM: allow to get pointer to spinlock protecting a directory jglisse
2015-05-21 20:23 ` [PATCH 24/36] HMM: split DMA mapping function in two jglisse
2015-05-21 20:23 ` [PATCH 25/36] HMM: add helpers for migration back to system memory jglisse
2015-05-21 20:23 ` [PATCH 26/36] HMM: fork copy migrated memory into system memory for child process jglisse
2015-05-21 20:23 ` [PATCH 27/36] HMM: CPU page fault on migrated memory jglisse
2015-05-21 20:23 ` [PATCH 28/36] HMM: add mirror fault support for system to device memory migration jglisse
2015-05-21 20:23 ` [PATCH 29/36] IB/mlx5: add a new paramter to __mlx_ib_populated_pas for ODP with HMM jglisse
2015-05-21 20:23 ` [PATCH 30/36] IB/mlx5: add a new paramter to mlx5_ib_update_mtt() " jglisse
2015-05-21 20:23 ` [PATCH 31/36] IB/odp: export rbt_ib_umem_for_each_in_range() jglisse
2015-05-21 20:23 ` [PATCH 32/36] IB/odp/hmm: add new kernel option to use HMM for ODP jglisse
2015-05-21 20:23 ` [PATCH 33/36] IB/odp/hmm: add core infiniband structure and helper for ODP with HMM jglisse
2015-06-24 13:59 ` Haggai Eran
2015-05-21 20:23 ` [PATCH 34/36] IB/mlx5/hmm: add mlx5 HMM device initialization and callback jglisse
2015-05-21 20:23 ` [PATCH 35/36] IB/mlx5/hmm: add page fault support for ODP on HMM jglisse
2015-05-21 20:23 ` [PATCH 36/36] IB/mlx5/hmm: enable ODP using HMM jglisse
2015-05-30 3:01 ` HMM (Heterogeneous Memory Management) v8 John Hubbard
2015-05-31 6:56 ` Haggai Eran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150629144305.GA2173@gmail.com \
--to=j.glisse@gmail.com \
--cc=Alexander.Deucher@amd.com \
--cc=Greg.Stoner@amd.com \
--cc=John.Bridgman@amd.com \
--cc=Laurent.Morichetti@amd.com \
--cc=Michael.Mantor@amd.com \
--cc=Oded.Gabbay@amd.com \
--cc=Paul.Blinzer@amd.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arvindg@nvidia.com \
--cc=ben.sander@amd.com \
--cc=blc@redhat.com \
--cc=cabuschardt@nvidia.com \
--cc=dpoole@nvidia.com \
--cc=haggaie@mellanox.com \
--cc=hpa@zytor.com \
--cc=jakumar@nvidia.com \
--cc=jdonohue@redhat.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=joro@8bytes.org \
--cc=jweiner@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liranl@mellanox.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=peterz@infradead.org \
--cc=raindel@mellanox.com \
--cc=riel@redhat.com \
--cc=roland@purestorage.com \
--cc=sgutti@nvidia.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).