From: Jerome Glisse <j.glisse@gmail.com>
To: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"joro@8bytes.org" <joro@8bytes.org>, Mel Gorman <mgorman@suse.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
Brendan Conoboy <blc@redhat.com>,
Joe Donohue <jdonohue@redhat.com>,
Duncan Poole <dpoole@nvidia.com>,
Sherry Cheung <SCheung@nvidia.com>,
Subhash Gutti <sgutti@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Lucien Dunning <ldunning@nvidia.com>,
Cameron Buschardt <cabuschardt@nvidia.com>,
Arvind Gopalakrishnan <arvindg@nvidia.com>,
Haggai
Subject: Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.
Date: Mon, 15 Jun 2015 10:32:16 -0400 [thread overview]
Message-ID: <20150615143215.GA1947@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1506111520350.25907@mdh-linux64-2.nvidia.com>
On Thu, Jun 11, 2015 at 03:26:46PM -0700, Mark Hairgrove wrote:
> On Thu, 11 Jun 2015, Jerome Glisse wrote:
> > On Wed, Jun 10, 2015 at 06:15:08PM -0700, Mark Hairgrove wrote:
[...]
> > Ok i see the race you are afraid of and really it is an unlikely one
> > __mutex_unlock_common_slowpath() take a spinlock right after allowing
> > other to take the mutex, when we are in your scenario there is no
> > contention on that spinlock so it is taken right away and as there
> > is no one in the mutex wait list then it goes directly to unlock the
> > spinlock and return. You can ignore the debug function as if debugging
> > is enabled than the mutex_lock() would need to also take the spinlock
> > and thus you would have proper synchronization btw 2 thread thanks to
> > the mutex.wait_lock.
> >
> > So basicly while CPU1 is going :
> > spin_lock(mutex.wait_lock)
> > if (!list_empty(mutex.wait_list)) {
> > // wait_list is empty so branch not taken
> > }
> > spin_unlock(mutex.wait_lock)
> >
> > CPU2 would have to test the mirror list and mutex_unlock and return
> > before the spin_unlock() of CPU1. This is a tight race, i can add a
> > synchronize_rcu() to device_unregister after the mutex_unlock() so
> > that we also add a grace period before the device is potentialy freed
> > which should make that race completely unlikely.
> >
> > Moreover for something really bad to happen it would need that the
> > freed memory to be reallocated right away by some other thread. Which
> > really sound unlikely unless CPU1 is the slowest of all :)
> >
> > Cheers,
> > Jérôme
> >
>
> But CPU1 could get preempted between the atomic_set and the
> spin_lock_mutex, and then it doesn't matter whether or not a grace period
> has elapsed before CPU2 proceeds.
>
> Making race conditions less likely just makes them harder to pinpoint when
> they inevitably appear in the wild. I don't think it makes sense to spend
> any effort in making a race condition less likely, and that thread I
> referenced (https://lkml.org/lkml/2013/12/2/997) is fairly strong evidence
> that fixing this race actually matters. So, I think this race condition
> really needs to be fixed.
>
> One fix is for hmm_mirror_unregister to wait for hmm_notifier_release
> completion between hmm_mirror_kill and hmm_mirror_unref. It can do this by
> calling synchronize_srcu() on the mmu_notifier's srcu. This has the
> benefit that the driver is guaranteed not to get the "mm is dead" callback
> after hmm_mirror_unregister returns.
>
> In fact, are there any callbacks on the mirror that can arrive after
> hmm_mirror_unregister? If so, how will hmm_device_unregister solve them?
>
> From a general standpoint, hmm_device_unregister must perform some kind of
> synchronization to be sure that all mirrors are completely released and
> done and no new callbacks will trigger. Since that has to be true, can't
> that synchronization be moved into hmm_mirror_unregister instead?
>
> If that happens there's no need for a "mirror can be freed" ->release
> callback at all because the driver is guaranteed that a mirror is done
> after hmm_mirror_unregister.
Well there is no need or 2 callback (relase|stop , free) just one, the
release|stop that is needed. I kind of went halfway last week on this.
I will probably rework that a little to keep just one call and rely on
driver to call hmm_mirror_unregister()
Cheers,
Jérôme
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-15 14:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1432236705-4209-1-git-send-email-j.glisse@gmail.com>
2015-05-21 19:31 ` [PATCH 05/36] HMM: introduce heterogeneous memory management v3 j.glisse
2015-05-27 5:50 ` Aneesh Kumar K.V
2015-05-27 14:38 ` Jerome Glisse
2015-06-08 19:40 ` Mark Hairgrove
[not found] ` <alpine.DEB.2.00.1506081222270.27796-ptWJzH4JGIzJt4XymMeBgkn48jw8i0AO@public.gmane.org>
2015-06-08 21:17 ` Jerome Glisse
2015-06-09 1:54 ` Mark Hairgrove
[not found] ` <alpine.DEB.2.00.1506081841490.1802-ptWJzH4JGIzJt4XymMeBgkn48jw8i0AO@public.gmane.org>
2015-06-09 15:56 ` Jerome Glisse
2015-06-10 3:33 ` Mark Hairgrove
2015-06-10 15:42 ` Jerome Glisse
2015-06-11 1:15 ` Mark Hairgrove
2015-06-11 14:23 ` Jerome Glisse
2015-06-11 22:26 ` Mark Hairgrove
2015-06-15 14:32 ` Jerome Glisse [this message]
[not found] ` <1432239792-5002-1-git-send-email-jglisse@redhat.com>
2015-05-21 20:23 ` [PATCH 30/36] IB/mlx5: add a new paramter to mlx5_ib_update_mtt() for ODP with HMM jglisse
2015-05-21 20:23 ` [PATCH 31/36] IB/odp: export rbt_ib_umem_for_each_in_range() jglisse
2015-05-21 20:23 ` [PATCH 32/36] IB/odp/hmm: add new kernel option to use HMM for ODP jglisse
2015-05-21 20:23 ` [PATCH 33/36] IB/odp/hmm: add core infiniband structure and helper for ODP with HMM jglisse
2015-06-24 13:59 ` Haggai Eran
2015-05-21 20:23 ` [PATCH 34/36] IB/mlx5/hmm: add mlx5 HMM device initialization and callback jglisse
2015-05-21 20:23 ` [PATCH 35/36] IB/mlx5/hmm: add page fault support for ODP on HMM jglisse
2015-05-21 20:23 ` [PATCH 36/36] IB/mlx5/hmm: enable ODP using HMM jglisse
[not found] <1432236233-4035-1-git-send-email-j.glisse@gmail.com>
[not found] ` <1432236233-4035-1-git-send-email-j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-21 19:23 ` [PATCH 05/36] HMM: introduce heterogeneous memory management v3 j.glisse-Re5JQEeQqe8AvxtiuMwx3w
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150615143215.GA1947@gmail.com \
--to=j.glisse@gmail.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arvindg@nvidia.com \
--cc=blc@redhat.com \
--cc=cabuschardt@nvidia.com \
--cc=dpoole@nvidia.com \
--cc=hpa@zytor.com \
--cc=jdonohue@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=joro@8bytes.org \
--cc=jweiner@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sgutti@nvidia.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox