From: Jerome Glisse <jglisse@redhat.com>
To: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org,
Linus Torvalds <torvalds@linux-foundation.org>,
joro@8bytes.org, Mel Gorman <mgorman@suse.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
Brendan Conoboy <blc@redhat.com>,
Joe Donohue <jdonohue@redhat.com>,
Christophe Harle <charle@nvidia.com>,
Duncan Poole <dpoole@nvidia.com>,
Sherry Cheung <SCheung@nvidia.com>,
Subhash Gutti <sgutti@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Lucien Dunning <ldunning@nvidia.com>,
Cameron Buschardt <cabuschardt@nvidia.com>,
Arvind Gopalakrishnan <arvindg@nvidia.com>,
Haggai Eran <haggaie@mellanox.com>,
Shachar Raindel <raindel@mellanox.com>,
Liran Liss <liranl@mellanox.com>,
Roland Dreier <roland@purestorage.com>,
Ben Sander <ben.sander@amd.com>,
Greg Stoner <Greg.Stoner@amd.com>,
John Bridgman <John.Bridgman@amd.com>,
Michael Mantor <Michael.Mantor@amd.com>,
Paul Blinzer <Paul.Blinzer@amd.com>,
Leonid Shamis <Leonid.Shamis@amd.com>,
Laurent Morichetti <Laurent.Morichetti@amd.com>,
Alexander Deucher <Alexander.Deucher@amd.com>
Subject: Re: [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4
Date: Tue, 1 Sep 2015 10:58:11 -0400 [thread overview]
Message-ID: <20150901145811.GA3058@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1508312003400.18393@mdh-linux64-2.nvidia.com>
On Mon, Aug 31, 2015 at 08:27:17PM -0700, Mark Hairgrove wrote:
> On Thu, 13 Aug 2015, Jerome Glisse wrote:
[...]
Will fix syntax.
[...]
> > +/* mmu_notifier_range_wait_valid() - wait for a range to have no conflict with
> > + * active invalidation.
> > + *
> > + * @mm: The mm struct.
> > + * @start: Start address of the range (inclusive).
> > + * @end: End address of the range (exclusive).
> > + *
> > + * This function wait for any active range invalidation that conflict with the
> > + * given range, to end.
> > + *
> > + * Note by the time this function return a new range invalidation that conflict
> > + * might have started. So you need to atomically block new range and query
> > + * again if range is still valid with mmu_notifier_range_is_valid(). So call
> > + * sequence should be :
> > + *
> > + * again:
> > + * mmu_notifier_range_wait_valid()
> > + * // block new invalidation using that lock inside your range_start callback
> > + * lock_block_new_invalidation()
> > + * if (!mmu_notifier_range_is_valid())
> > + * goto again;
> > + * unlock()
>
> I think this example sequence can deadlock so I wouldn't want to encourage
> its use. New invalidation regions are added to the list before the
> range_start callback is invoked.
>
> Thread A Thread B
> ----------------- -----------------
> mmu_notifier_range_wait_valid
> // returns
> __mmu_notifier_invalidate_range_start
> list_add_tail
> lock_block_new_invalidation
> ->invalidate_range_start
> // invalidation blocked in callback
> mmu_notifier_range_is_valid // fails
> goto again
> mmu_notifier_range_wait_valid // deadlock
>
> mmu_notifier_range_wait_valid can't finish until thread B's callback
> returns, but thread B's callback can't return because it's blocked.
>
> I see that HMM in later patches takes the approach of not holding the lock
> when mmu_notifier_range_is_valid returns false. Instead of stalling new
> invalidations it returns -EAGAIN to the caller. While that resolves the
> deadlock, it won't prevent the faulting thread from being starved in the
> pathological case.
The comment here is not clear, what HMM does is what is intended. If
mmu_notifier_range_is_valid() return false then you drop lock and try
again. I am not sure we should care about the starve case as it can
not happen, it would mean that something keeps invalidating over and
over the same range of address space of a process. I do not see how
such thing would happen.
>
> Is it out of the question to build a lock into the mmu notifier API
> directly? It's a little worrisome to me that the complexity for this
> locking is pushed into the callbacks rather than handled in the core.
> Something like this:
>
> mmu_notifier_range_lock(start, end)
> mmu_notifier_range_unlock(start, end)
If a range is about to be invalidated it is better to avoid faulting
in memory in the device as that same memory is about to be invalidated.
This is why i always have invalidation take precedence over device fault.
> If that's not feasible and we have to stick with the current approach,
> then I suggest changing the "valid" name. "valid" doesn't have a clear
> meaning at first glance because the reader doesn't know what would make a
> range "valid." How about "active" instead? Then the names would look
> something like this, assuming the polarity matches their current versions:
>
> mmu_notifier_range_inactive_locked
> mmu_notifier_range_inactive
> mmu_notifier_range_wait_active
Those names are better i will update the patch accordingly.
Thanks for the review,
Jerome
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-09-01 14:58 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 19:15 HMM (Heterogeneous Memory Management) v10 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 01/15] mmu_notifier: add event information to address invalidation v8 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4 Jérôme Glisse
2015-09-01 3:27 ` Mark Hairgrove
2015-09-01 14:58 ` Jerome Glisse [this message]
2015-08-13 19:15 ` [PATCH 03/15] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 04/15] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier Jérôme Glisse
2015-08-13 19:15 ` [PATCH 05/15] HMM: introduce heterogeneous memory management v5 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 06/15] HMM: add HMM page table v4 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 07/15] HMM: add per mirror " Jérôme Glisse
2015-08-13 19:15 ` [PATCH 08/15] HMM: add device page fault support v4 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 09/15] HMM: add mm page table iterator helpers Jérôme Glisse
2015-08-13 19:15 ` [PATCH 10/15] HMM: use CPU page table during invalidation Jérôme Glisse
2015-08-13 19:15 ` [PATCH 11/15] HMM: add discard range helper (to clear and free resources for a range) Jérôme Glisse
2015-08-13 19:15 ` [PATCH 12/15] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 13/15] HMM: DMA map memory on behalf of device driver v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 14/15] HMM: add documentation explaining HMM internals and how to use it Jérôme Glisse
2015-08-13 19:15 ` [PATCH 15/15] hmm/dummy: dummy driver for testing and showcasing the HMM API Jérôme Glisse
2015-09-23 10:21 ` HMM (Heterogeneous Memory Management) v10 Pavel Machek
-- strict thread matches above, loose matches on Subject: below --
2015-07-17 18:52 [PATCH 00/15] HMM (Heterogeneous Memory Management) v9 Jérôme Glisse
2015-07-17 18:52 ` [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4 Jérôme Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150901145811.GA3058@redhat.com \
--to=jglisse@redhat.com \
--cc=Alexander.Deucher@amd.com \
--cc=Greg.Stoner@amd.com \
--cc=John.Bridgman@amd.com \
--cc=Laurent.Morichetti@amd.com \
--cc=Leonid.Shamis@amd.com \
--cc=Michael.Mantor@amd.com \
--cc=Paul.Blinzer@amd.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arvindg@nvidia.com \
--cc=ben.sander@amd.com \
--cc=blc@redhat.com \
--cc=cabuschardt@nvidia.com \
--cc=charle@nvidia.com \
--cc=dpoole@nvidia.com \
--cc=haggaie@mellanox.com \
--cc=hpa@zytor.com \
--cc=jdonohue@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=joro@8bytes.org \
--cc=jweiner@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liranl@mellanox.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=peterz@infradead.org \
--cc=raindel@mellanox.com \
--cc=riel@redhat.com \
--cc=roland@purestorage.com \
--cc=sgutti@nvidia.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).