From: Jason Gunthorpe <jgg@ziepe.ca>
To: Ralph Campbell <rcampbell@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Felix.Kuehling@amd.com, linux-rdma@vger.kernel.org,
linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 hmm 05/11] mm/hmm: Remove duplicate condition test before wait_event_timeout
Date: Fri, 7 Jun 2019 17:44:27 -0300 [thread overview]
Message-ID: <20190607204427.GU14802@ziepe.ca> (raw)
In-Reply-To: <e17aa8c5-790c-d977-2eb8-c18cdaa4cbb3@nvidia.com>
On Fri, Jun 07, 2019 at 01:21:12PM -0700, Ralph Campbell wrote:
> > What I want to get to is a pattern like this:
> >
> > pagefault():
> >
> > hmm_range_register(&range);
> > again:
> > /* On the slow path, if we appear to be live locked then we get
> > the write side of mmap_sem which will break the live lock,
> > otherwise this gets the read lock */
> > if (hmm_range_start_and_lock(&range))
> > goto err;
> >
> > lockdep_assert_held(range->mm->mmap_sem);
> >
> > // Optional: Avoid useless expensive work
> > if (hmm_range_needs_retry(&range))
> > goto again;
> > hmm_range_(touch vmas)
> >
> > take_lock(driver->update);
> > if (hmm_range_end(&range) {
> > release_lock(driver->update);
> > goto again;
> > }
> > // Finish driver updates
> > release_lock(driver->update);
> >
> > // Releases mmap_sem
> > hmm_range_unregister_and_unlock(&range);
> >
> > What do you think?
> >
> > Is it clear?
> >
> > Jason
> >
>
> Are you talking about acquiring mmap_sem in hmm_range_start_and_lock()?
> Usually, the fault code has to lock mmap_sem for read in order to
> call find_vma() so it can set range.vma.
> If HMM drops mmap_sem - which I don't think it should, just return an
> error to tell the caller to drop mmap_sem and retry - the find_vma()
> will need to be repeated as well.
Overall I don't think it makes a lot of sense to sleep for retry in
hmm_range_start_and_lock() while holding mmap_sem. It would be better
to drop that lock, sleep, then re-acquire it as part of the hmm logic.
The find_vma should be done inside the critical section created by
hmm_range_start_and_lock(), not before it. If we are retrying then we
already slept and the additional CPU cost to repeat the find_vma is
immaterial, IMHO?
Do you see a reason why the find_vma() ever needs to be before the
'again' in my above example? range.vma does not need to be set for
range_register.
> I'm also not sure about acquiring the mmap_sem for write as way to
> mitigate thrashing. It seems to me that if a device and a CPU are
> both faulting on the same page,
One of the reasons to prefer this approach is that it means we don't
need to keep track of which ranges we are faulting, and if there is a
lot of *unrelated* fault activity (unlikely?) we can resolve it using
mmap sem instead of this elaborate ranges scheme and related
locking.
This would reduce the overall work in the page fault and
invalidate_start/end paths for the common uncontended cases.
> some sort of backoff delay is needed to let one side or the other
> make some progress.
What the write side of the mmap_sem would do is force the CPU and
device to cleanly take turns. Once the device pages are registered
under the write side the CPU will have to wait in invalidate_start for
the driver to complete a shootdown, then the whole thing starts all
over again.
It is certainly imaginable something could have a 'min life' timer for
a device mapping and hold mm invalidate_start, and device pagefault
for that min time to promote better sharing.
But, if we don't use the mmap_sem then we can livelock and the device
will see an unrecoverable error from the timeout which means we have
risk that under load the system will simply obscurely fail. This seems
unacceptable to me..
Particularly since for the ODP use case the issue is not trashing
migration as a GPU might have, but simple system stability under swap
load. We do not want the ODP pagefault to permanently fail due to
timeout if the VMA is still valid..
Jason
next prev parent reply other threads:[~2019-06-07 20:44 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 18:44 [PATCH v2 hmm 00/11] Various revisions from a locking/code review Jason Gunthorpe
2019-06-06 18:44 ` [PATCH v2 hmm 01/11] mm/hmm: fix use after free with struct hmm in the mmu notifiers Jason Gunthorpe
2019-06-07 2:29 ` John Hubbard
2019-06-07 12:34 ` Jason Gunthorpe
2019-06-07 13:42 ` Jason Gunthorpe
2019-06-08 1:13 ` John Hubbard
2019-06-08 1:37 ` John Hubbard
2019-06-07 18:12 ` Ralph Campbell
2019-06-08 8:49 ` Christoph Hellwig
2019-06-08 11:33 ` Jason Gunthorpe
2019-06-06 18:44 ` [PATCH v2 hmm 02/11] mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register Jason Gunthorpe
2019-06-07 2:36 ` John Hubbard
2019-06-07 18:24 ` Ralph Campbell
2019-06-07 22:39 ` Ralph Campbell
2019-06-10 13:09 ` Jason Gunthorpe
2019-06-07 22:33 ` Ira Weiny
2019-06-08 8:54 ` Christoph Hellwig
2019-06-11 19:44 ` Jason Gunthorpe
2019-06-12 7:12 ` Christoph Hellwig
2019-06-12 11:41 ` Jason Gunthorpe
2019-06-12 12:11 ` Christoph Hellwig
2019-06-06 18:44 ` [PATCH v2 hmm 03/11] mm/hmm: Hold a mmgrab from hmm to mm Jason Gunthorpe
2019-06-07 2:44 ` John Hubbard
2019-06-07 12:36 ` Jason Gunthorpe
2019-06-07 18:41 ` Ralph Campbell
2019-06-07 18:51 ` Jason Gunthorpe
2019-06-07 22:38 ` Ira Weiny
2019-06-06 18:44 ` [PATCH v2 hmm 04/11] mm/hmm: Simplify hmm_get_or_create and make it reliable Jason Gunthorpe
2019-06-07 2:54 ` John Hubbard
2019-06-07 18:52 ` Ralph Campbell
2019-06-07 22:44 ` Ira Weiny
2019-06-06 18:44 ` [PATCH v2 hmm 05/11] mm/hmm: Remove duplicate condition test before wait_event_timeout Jason Gunthorpe
2019-06-07 3:06 ` John Hubbard
2019-06-07 12:47 ` Jason Gunthorpe
2019-06-07 13:31 ` [PATCH v3 " Jason Gunthorpe
2019-06-07 22:55 ` Ira Weiny
2019-06-08 1:32 ` John Hubbard
2019-06-07 19:01 ` [PATCH v2 " Ralph Campbell
2019-06-07 19:13 ` Jason Gunthorpe
2019-06-07 20:21 ` Ralph Campbell
2019-06-07 20:44 ` Jason Gunthorpe [this message]
2019-06-07 22:13 ` Ralph Campbell
2019-06-08 1:47 ` Jason Gunthorpe
2019-06-06 18:44 ` [PATCH v2 hmm 06/11] mm/hmm: Hold on to the mmget for the lifetime of the range Jason Gunthorpe
2019-06-07 3:15 ` John Hubbard
2019-06-07 20:29 ` Ralph Campbell
2019-06-06 18:44 ` [PATCH v2 hmm 07/11] mm/hmm: Use lockdep instead of comments Jason Gunthorpe
2019-06-07 3:19 ` John Hubbard
2019-06-07 20:31 ` Ralph Campbell
2019-06-07 22:16 ` Souptick Joarder
2019-06-06 18:44 ` [PATCH v2 hmm 08/11] mm/hmm: Remove racy protection against double-unregistration Jason Gunthorpe
2019-06-07 3:29 ` John Hubbard
2019-06-07 13:57 ` Jason Gunthorpe
2019-06-07 20:33 ` Ralph Campbell
2019-06-06 18:44 ` [PATCH v2 hmm 09/11] mm/hmm: Poison hmm_range during unregister Jason Gunthorpe
2019-06-07 3:37 ` John Hubbard
2019-06-07 14:03 ` Jason Gunthorpe
2019-06-07 20:46 ` Ralph Campbell
2019-06-07 20:49 ` Jason Gunthorpe
2019-06-07 23:01 ` Ira Weiny
2019-06-06 18:44 ` [PATCH v2 hmm 10/11] mm/hmm: Do not use list*_rcu() for hmm->ranges Jason Gunthorpe
2019-06-07 3:40 ` John Hubbard
2019-06-07 20:49 ` Ralph Campbell
2019-06-07 22:11 ` Souptick Joarder
2019-06-07 23:02 ` Ira Weiny
2019-06-06 18:44 ` [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release Jason Gunthorpe
2019-06-07 3:47 ` John Hubbard
2019-06-07 12:58 ` Jason Gunthorpe
2019-06-07 21:37 ` Ralph Campbell
2019-06-08 2:12 ` Jason Gunthorpe
2019-06-10 16:02 ` Jason Gunthorpe
2019-06-10 22:03 ` Ralph Campbell
2019-06-07 16:05 ` [PATCH v2 12/11] mm/hmm: Fix error flows in hmm_invalidate_range_start Jason Gunthorpe
2019-06-07 23:52 ` Ralph Campbell
2019-06-08 1:35 ` Jason Gunthorpe
2019-06-11 19:48 ` [PATCH v2 hmm 00/11] Various revisions from a locking/code review Jason Gunthorpe
2019-06-12 17:54 ` Kuehling, Felix
2019-06-12 21:49 ` Yang, Philip
2019-06-13 17:50 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190607204427.GU14802@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=Felix.Kuehling@amd.com \
--cc=aarcange@redhat.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=rcampbell@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).