From: Jerome Glisse <j.glisse@gmail.com>
To: Davidlohr Bueso <davidlohr@hp.com>
Cc: sagi grimberg <sagig@mellanox.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
Jerome Glisse <jglisse@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
Joe Donohue <jdonohue@redhat.com>,
Duncan Poole <dpoole@nvidia.com>,
Sherry Cheung <SCheung@nvidia.com>,
Subhash Gutti <sgutti@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Mark Hairgrove <mhairgrove@nvidia.com>,
Lucien Dunning <ldunning@nvidia.com>, Cameron Buschardt <
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Thu, 8 May 2014 21:45:45 -0400 [thread overview]
Message-ID: <20140509014544.GB2906@gmail.com> (raw)
In-Reply-To: <1399599734.2497.2.camel@buesod1.americas.hpqcorp.net>
On Thu, May 08, 2014 at 06:42:14PM -0700, Davidlohr Bueso wrote:
> On Thu, 2014-05-08 at 13:56 -0400, Jerome Glisse wrote:
> > On Thu, May 08, 2014 at 07:47:04PM +0300, sagi grimberg wrote:
> > > On 5/7/2014 5:33 AM, Davidlohr Bueso wrote:
> > > >On Tue, 2014-05-06 at 12:29 +0200, Peter Zijlstra wrote:
> > > >>So you forgot to CC Linus, Linus has expressed some dislike for
> > > >>preemptible mmu_notifiers in the recent past:
> > > >>
> > > >> https://lkml.org/lkml/2013/9/30/385
> > > >I'm glad this came up again.
> > > >
> > > >So I've been running benchmarks (mostly aim7, which nicely exercises our
> > > >locks) comparing my recent v4 for rwsem optimistic spinning against
> > > >previous implementation ideas for the anon-vma lock, mostly:
> > > >
> > > >- rwsem (currently)
> > > >- rwlock_t
> > > >- qrwlock_t
> > > >- rwsem+optspin
> > > >
> > > >Of course, *any* change provides significant improvement in throughput
> > > >for several workloads, by avoiding to block -- there are more
> > > >performance numbers in the different patches. This is fairly obvious.
> > > >
> > > >What is perhaps not so obvious is that rwsem+optimistic spinning beats
> > > >all others, including the improved qrwlock from Waiman and Peter. This
> > > >is mostly because of the idea of cancelable MCS, which was mimic'ed from
> > > >mutexes. The delta in most cases is around +10-15%, which is non
> > > >trivial.
> > >
> > > These are great news David!
> > >
> > > >I mention this because from a performance PoV, we'll stop caring so much
> > > >about the type of lock we require in the notifier related code. So while
> > > >this is not conclusive, I'm not as opposed to keeping the locks blocking
> > > >as I once was. Now this might still imply things like poor design
> > > >choices, but that's neither here nor there.
> > >
> > > So is the rwsem+opt strategy the way to go Given it keeps everyone happy?
> > > We will be more than satisfied with it as it will allow us to
> > > guarantee device
> > > MMU update.
> > >
> > > >/me sees Sagi smiling ;)
> > >
> > > :)
> >
> > So i started doing thing with tlb flush but i must say things looks ugly.
> > I need a new page flag (goodbye 32bits platform) and i need my own lru and
> > page reclaimation for any page in use by a device, i need to hook up inside
> > try_to_unmap or migrate (but i will do the former). I am trying to be smart
> > by trying to schedule a worker on another cpu before before sending the ipi
> > so that while the ipi is in progress hopefully another cpu might schedule
> > the invalidation on the GPU and the wait after ipi for the gpu will be quick.
> >
> > So all in all this is looking ugly and it does not change the fact that i
> > sleep (well need to be able to sleep). It just move the sleeping to another
> > part.
> >
> > Maybe i should stress that with the mmu_notifier version it only sleep for
> > process that are using the GPU those process are using userspace API like
> > OpenCL which are not playing well with fork, ie read do not use fork if
> > you are using such API.
> >
> > So for my case if a process has mm->hmm set to something that would mean
> > that there is a GPU using that address space and that it is unlikely to
> > go under the massive workload that people try to optimize the anon_vma
> > lock for.
> >
> > My point is that with rwsem+optspin it could try spinning if mm->hmm
> > was NULL and make the massive fork workload go fast, or it could sleep
> > directly if mm->hmm is set.
>
> Sorry? Unless I'm misunderstanding you, we don't do such things. Our
> locks are generic and need to work for any circumstance, no special
> cases here and there... _specially_ with these kind of things. So no,
> rwsem will spin as long as the owner is set, just like any other users.
>
> Thanks,
> Davidlohr
>
I do not mind spining all time i was just thinking that it could be optimize
away in case there is hmm for the current mm as it means that any way there
very much likely gonna be a schedule inside the mmu_notifier.
But if you prefer keep code generic i am fine with wasting cpu cycle.
Cheers,
Jérôme Glisse
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2014-05-09 1:45 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 14:57 ` Linus Torvalds
2014-05-06 15:00 ` Jerome Glisse
2014-05-06 15:18 ` Linus Torvalds
2014-05-06 15:33 ` Jerome Glisse
2014-05-06 15:42 ` Rik van Riel
2014-05-06 15:47 ` Linus Torvalds
2014-05-06 16:18 ` Jerome Glisse
2014-05-06 16:32 ` Linus Torvalds
2014-05-06 16:49 ` Jerome Glisse
2014-05-06 17:28 ` Jerome Glisse
2014-05-06 17:43 ` Linus Torvalds
2014-05-06 18:13 ` Jerome Glisse
2014-05-06 18:22 ` Linus Torvalds
2014-05-06 18:38 ` Jerome Glisse
2014-05-07 7:18 ` Benjamin Herrenschmidt
2014-05-07 7:14 ` Benjamin Herrenschmidt
2014-05-07 12:39 ` Jerome Glisse
2014-05-09 1:26 ` Jerome Glisse
2014-05-10 4:28 ` Benjamin Herrenschmidt
2014-05-11 0:48 ` Jerome Glisse
2014-05-06 16:30 ` Rik van Riel
2014-05-06 16:34 ` Linus Torvalds
2014-05-06 16:47 ` Rik van Riel
2014-05-06 16:54 ` Jerome Glisse
2014-05-06 18:02 ` H. Peter Anvin
2014-05-06 18:26 ` Jerome Glisse
2014-05-06 22:44 ` David Airlie
2014-05-07 2:33 ` Davidlohr Bueso
2014-05-07 13:00 ` Peter Zijlstra
2014-05-07 17:34 ` Davidlohr Bueso
2014-05-07 16:21 ` Linus Torvalds
2014-05-08 16:47 ` sagi grimberg
2014-05-08 17:56 ` Jerome Glisse
2014-05-09 1:42 ` Davidlohr Bueso
2014-05-09 1:45 ` Jerome Glisse [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140509014544.GB2906@gmail.com \
--to=j.glisse@gmail.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=blc@redhat.com \
--cc=davidlohr@hp.com \
--cc=dpoole@nvidia.com \
--cc=hpa@zytor.com \
--cc=jdonohue@redhat.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=jweiner@redhat.com \
--cc=kem@redhat.com \
--cc=law@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwang@redhat.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=mhairgrove@nvidia.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sagig@mellanox.com \
--cc=sgutti@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).